submitted by /u/Calatravo
[link] [comments]
( 41
min )
submitted by /u/BackgroundResult
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
Here is our podcast episode with Sergey Levine from UC Berkeley where we discussed the evolution of deep reinforcement learning, how previous robotics approaches were replaced, and why offline RL is significant for future generalization.
submitted by /u/thejashGI
[link] [comments]
( 43
min )
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
The race toward sentient AI is on. A combination of hubris and competition between governments and societies akin to an arms race virtually ensures ‘sentient’ AI/AGI/ASI will be developed in relatively short order. There is increasing evidence such as the Othello Paper that is upending the auto-complete narrative already. LLMs having a world model implies theory of mind, and thus at least Functional Consciousness (albeit quantized for the time being) which likely in turn confers some form of partial non-anthropomorphic sentience, which will at some point open an ethical, societal, and religious Pandora’s box (see the Bodhisattva vow). The only thing we don’t know is just how far down this slippery slope we are at the moment. It’s also hard to argue against the runaway AI effect as well in …
( 43
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/TraxDarkstorm
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/pospielov
[link] [comments]
( 41
min )
submitted by /u/dreamfi_617
[link] [comments]
( 41
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/cbsudux
[link] [comments]
( 41
min )
submitted by /u/dpierce94
[link] [comments]
( 41
min )
submitted by /u/timCrooks
[link] [comments]
( 41
min )
submitted by /u/DPC_1
[link] [comments]
( 41
min )
submitted by /u/arnolds112
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/HEAL3D
[link] [comments]
( 41
min )
Financial market participants are faced with an overload of information that influences their decisions, and sentiment analysis stands out as a useful tool to help separate out the relevant and meaningful facts and figures. However, the same piece of news can have a positive or negative impact on stock prices, which presents a challenge for […]
( 14
min )
Amazon Kendra is an easy-to-use intelligent search service that allows you to integrate search capabilities with your applications so users can find information stored across data sources like Amazon Simple Storage Service , OneDrive and Google Drive; applications such as SalesForce, SharePoint and Service Now; and relational databases like Amazon Relational Database Service (Amazon RDS). Using […]
( 9
min )
March is already here and a new month always means new games, with a total of 19 joining the GeForce NOW library. Set off on a magical journey to restore Disney magic when Disney Dreamlight Valley joins the cloud later this month. Plus, the hunt is on with Capcom’s Monster Hunter Rise now available for Read article >
( 6
min )
A process that seeks feedback from human specialists proves more effective at optimization than automated systems working alone.
( 9
min )
Appendicitis is among the most frequent reasons for pediatric abdominal
surgeries. With recent advances in machine learning, data-driven decision
support could help clinicians diagnose and manage patients while reducing the
number of non-critical surgeries. Previous decision support systems for
appendicitis focused on clinical, laboratory, scoring and computed tomography
data, mainly ignoring abdominal ultrasound, a noninvasive and readily available
diagnostic modality. To this end, we developed and validated interpretable
machine learning models for predicting the diagnosis, management and severity
of suspected appendicitis using ultrasound images. Our models were trained on a
dataset comprising 579 pediatric patients with 1709 ultrasound images
accompanied by clinical and laboratory data. Our methodological contribution is
the generalization of concept bottleneck models to prediction problems with
multiple views and incomplete concept sets. Notably, such models lend
themselves to interpretation and interaction via high-level concepts
understandable to clinicians without sacrificing performance or requiring
time-consuming image annotation when deployed.
( 2
min )
In computer vision, it is often observed that formulating regression problems
as a classification task often yields better performance. We investigate this
curious phenomenon and provide a derivation to show that classification, with
the cross-entropy loss, outperforms regression with a mean squared error loss
in its ability to learn high-entropy feature representations. Based on the
analysis, we propose an ordinal entropy loss to encourage higher-entropy
feature spaces while maintaining ordinal relationships to improve the
performance of regression tasks. Experiments on synthetic and real-world
regression tasks demonstrate the importance and benefits of increasing entropy
for regression.
( 2
min )
We propose a new high-performance activation function, Moderate Adaptive
Linear Units (MoLU), for the deep neural network. The MoLU is a simple,
beautiful and powerful activation function that can be a good main activation
function among hundreds of activation functions. Because the MoLU is made up of
the elementary functions, not only it is a infinite diffeomorphism (i.e. smooth
and infinitely differentiable over whole domains), but also it decreases
training time.
( 2
min )
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.
( 2
min )
Recently, score-based generative models have been successfully employed for
the task of speech enhancement. A stochastic differential equation is used to
model the iterative forward process, where at each step environmental noise and
white Gaussian noise are added to the clean speech signal. While in limit the
mean of the forward process ends at the noisy mixture, in practice it stops
earlier and thus only at an approximation of the noisy mixture. This results in
a discrepancy between the terminating distribution of the forward process and
the prior used for solving the reverse process at inference. In this paper, we
address this discrepancy. To this end, we propose a forward process based on a
Brownian bridge and show that such a process leads to a reduction of the
mismatch compared to previous diffusion processes. More importantly, we show
that our approach improves in objective metrics over the baseline process with
only half of the iteration steps and having one hyperparameter less to tune.
( 2
min )
Adversarial training is a standard technique for training adversarially
robust models. In this paper, we study adversarial training as an alternating
best-response strategy in a 2-player zero-sum game. We prove that even in a
simple scenario of a linear classifier and a statistical model that abstracts
robust vs. non-robust features, the alternating best response strategy of such
game may not converge. On the other hand, a unique pure Nash equilibrium of the
game exists and is provably robust. We support our theoretical results with
experiments, showing the non-convergence of adversarial training and the
robustness of Nash equilibrium.
( 2
min )
In reinforcement learning for safety-critical settings, it is often desirable
for the agent to obey safety constraints at all points in time, including
during training. We present a novel neurosymbolic approach called SPICE to
solve this safe exploration problem. SPICE uses an online shielding layer based
on symbolic weakest preconditions to achieve a more precise safety analysis
than existing tools without unduly impacting the training process. We evaluate
the approach on a suite of continuous control benchmarks and show that it can
achieve comparable performance to existing safe learning techniques while
incurring fewer safety violations. Additionally, we present theoretical results
showing that SPICE converges to the optimal safe policy under reasonable
assumptions.
( 2
min )
Inverse molecular design is critical in material science and drug discovery,
where the generated molecules should satisfy certain desirable properties. In
this paper, we propose equivariant energy-guided stochastic differential
equations (EEGSDE), a flexible framework for controllable 3D molecule
generation under the guidance of an energy function in diffusion models.
Formally, we show that EEGSDE naturally exploits the geometric symmetry in 3D
molecular conformation, as long as the energy function is invariant to
orthogonal transformations. Empirically, under the guidance of designed energy
functions, EEGSDE significantly improves the baseline on QM9, in inverse
molecular design targeted to quantum properties and molecular structures.
Furthermore, EEGSDE is able to generate molecules with multiple target
properties by combining the corresponding energy functions linearly.
( 2
min )
Temporal distributional shifts, with underlying dynamics changing over time,
frequently occur in real-world time series and pose a fundamental challenge for
deep neural networks (DNNs). In this paper, we propose a novel deep sequence
model based on the Koopman theory for time series forecasting: Koopman Neural
Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the
coefficients of chosen measurement functions. KNF imposes appropriate inductive
biases for improved robustness against distributional shifts, employing both a
global operator to learn shared characteristics and a local operator to capture
changing dynamics, as well as a specially-designed feedback loop to
continuously update the learned operators over time for rapidly varying
behaviors. We demonstrate that \ours{} achieves superior performance compared
to the alternatives, on multiple time series datasets that are shown to suffer
from distribution shifts.
( 2
min )
A kernel-based quantum classifier is the most practical and influential
quantum machine learning technique for the hyper-linear classification of
complex data. We propose a Variational Quantum Approximate Support Vector
Machine (VQASVM) algorithm that demonstrates empirical sub-quadratic run-time
complexity with quantum operations feasible even in NISQ computers. We
experimented our algorithm with toy example dataset on cloud-based NISQ
machines as a proof of concept. We also numerically investigated its
performance on the standard Iris flower and MNIST datasets to confirm the
practicality and scalability.
( 2
min )
We analyze a large corpus of police incident narrative documents in
understanding the spatial distribution of the topics. The motivation for doing
this is that police narratives in each incident report contains very
fine-grained information that is richer than the category that is manually
assigned by the police. Our approach is to split the corpus into topics using
two different unsupervised machine learning algorithms - Latent Dirichlet
Allocation and Non-negative Matrix Factorization. We validate the performance
of each learned topic model using model coherence. Then, using a k-nearest
neighbors density ratio estimation (kNN-DRE) approach that we propose, we
estimate the spatial density ratio per topic and use this for data discovery
and analysis of each topic, allowing for insights into the described incidents
at scale. We provide a qualitative assessment of each topic and highlight some
key benefits for using our kNN-DRE model for estimating spatial trends.
( 2
min )
In this paper, we study the generalization performance of global minima for
implementing empirical risk minimization (ERM) on over-parameterized deep ReLU
nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove
that there exist perfect global minima achieving almost optimal generalization
error bounds for numerous types of data under mild conditions. Since
over-parameterization is crucial to guarantee that the global minima of ERM on
deep ReLU nets can be realized by the widely used stochastic gradient descent
(SGD) algorithm, our results indeed fill a gap between optimization and
generalization.
( 2
min )
Fixing energy leakage caused by different anomalies can result in significant
energy savings and extended appliance life. Further, it assists grid operators
in scheduling their resources to meet the actual needs of end users, while
helping end users reduce their energy costs. In this paper, we analyze the
patterns pertaining to the power consumption of dishwashers used in two houses
of the REFIT dataset. Then two autoencoder (AEs) with 1D-CNN and TCN as
backbones are trained to differentiate the normal patterns from the abnormal
ones. Our results indicate that TCN outperforms CNN1D in detecting anomalies in
energy consumption. Finally, the data from the Fridge_Freezer and the Freezer
of house No. 3 in REFIT is also used to evaluate our approach.
( 2
min )
Audio Spectrogram Transformer models rule the field of Audio Tagging,
outrunning previously dominating Convolutional Neural Networks (CNNs). Their
superiority is based on the ability to scale up and exploit large-scale
datasets such as AudioSet. However, Transformers are demanding in terms of
model size and computational requirements compared to CNNs. We propose a
training procedure for efficient CNNs based on offline Knowledge Distillation
(KD) from high-performing yet complex transformers. The proposed training
schema and the efficient CNN design based on MobileNetV3 results in models
outperforming previous solutions in terms of parameter and computational
efficiency and prediction performance. We provide models of different
complexity levels, scaling from low-complexity models up to a new
state-of-the-art performance of .483 mAP on AudioSet. Source Code available at:
https://github.com/fschmid56/EfficientAT
( 2
min )
Self-supervised learning has significantly improved the performance of many
NLP tasks. However, how can self-supervised learning discover useful
representations, and why is it better than traditional approaches such as
probabilistic models are still largely unknown. In this paper, we focus on the
context of topic modeling and highlight a key advantage of self-supervised
learning - when applied to data generated by topic models, self-supervised
learning can be oblivious to the specific model, and hence is less susceptible
to model misspecification. In particular, we prove that commonly used
self-supervised objectives based on reconstruction or contrastive samples can
both recover useful posterior information for general topic models.
Empirically, we show that the same objectives can perform on par with posterior
inference using the correct model, while outperforming posterior inference
using misspecified models.
( 2
min )
Ridesharing platforms are a type of two-sided marketplace where
``supply-demand balance'' is critical for market efficiency and yet is complex
to define and analyze. We present a unified analytical framework based on the
graph-based equilibrium metric (GEM) for quantifying the supply-demand
spatiotemporal state and efficiency of a ridesharing marketplace. GEM was
developed as a generalized Wasserstein distance between the supply and demand
distributions in a ridesharing market and has been used as an evaluation metric
for algorithms expected to improve supply-demand alignment. Building upon GEM,
we develop SD-GEM, a dual-perspective (supply- and demand-side) representation
of rideshare market equilibrium. We show that there are often disparities
between the two views and examine how this dual-view leads to the notion of
market efficiency, in which we propose novel statistical tests for capturing
improvement and explaining the underlying driving factors.
( 2
min )
Federated Learning (FL) has emerged as a de facto machine learning area and
received rapid increasing research interests from the community. However,
catastrophic forgetting caused by data heterogeneity and partial participation
poses distinctive challenges for FL, which are detrimental to the performance.
To tackle the problems, we propose a new FL approach (namely GradMA), which
takes inspiration from continual learning to simultaneously correct the
server-side and worker-side update directions as well as take full advantage of
server's rich computing and memory resources. Furthermore, we elaborate a
memory reduction strategy to enable GradMA to accommodate FL with a large scale
of workers. We then analyze convergence of GradMA theoretically under the
smooth non-convex setting and show that its convergence rate achieves a linear
speed up w.r.t the increasing number of sampled active workers. At last, our
extensive experiments on various image classification tasks show that GradMA
achieves significant performance gains in accuracy and communication efficiency
compared to SOTA baselines.
( 2
min )
Estimation of the complete distribution of a random variable is a useful
primitive for both manual and automated decision making. This problem has
received extensive attention in the i.i.d. setting, but the arbitrary data
dependent setting remains largely unaddressed. Consistent with known
impossibility results, we present computationally felicitous time-uniform and
value-uniform bounds on the CDF of the running averaged conditional
distribution of a real-valued random variable which are always valid and
sometimes trivial, along with an instance-dependent convergence guarantee. The
importance-weighted extension is appropriate for estimating complete
counterfactual distributions of rewards given controlled experimentation data
exhaust, e.g., from an A/B test or a contextual bandit.
( 2
min )
Graph neural networks (GNNs) have been applied to a large variety of
applications in materials science and chemistry. Here, we recapitulate the
graph construction for crystalline (periodic) materials and investigate its
impact on the GNNs model performance. We suggest the asymmetric unit cell as a
representation to reduce the number of atoms by using all symmetries of the
system. With a simple but systematically built GNN architecture based on
message passing and line graph templates, we furthermore introduce a general
architecture (Nested Graph Network, NGN) that is applicable to a wide range of
tasks and systematically improves state-of-the-art results on the MatBench
benchmark datasets.
( 2
min )
This paper introduces a new sparse Bayesian learning (SBL) algorithm that
jointly recovers a temporal sequence of edge maps from noisy and under-sampled
Fourier data. The new method is cast in a Bayesian framework and uses a prior
that simultaneously incorporates intra-image information to promote sparsity in
each individual edge map with inter-image information to promote similarities
in any unchanged regions. By treating both the edges as well as the similarity
between adjacent images as random variables, there is no need to separately
form regions of change. Thus we avoid both additional computational cost as
well as any information loss resulting from pre-processing the image. Our
numerical examples demonstrate that our new method compares favorably with more
standard SBL approaches.
( 2
min )
We propose a class of models based on Fisher's Linear Discriminant (FLD) in
the context of domain adaptation. The class is the convex combination of two
hypotheses: i) an average hypothesis representing previously seen source tasks
and ii) a hypothesis trained on a new target task. For a particular generative
setting we derive the optimal convex combination of the two models under 0-1
loss, propose a computable approximation, and study the effect of various
parameter settings on the relative risks between the optimal hypothesis,
hypothesis i), and hypothesis ii). We demonstrate the effectiveness of the
proposed optimal classifier in the context of EEG- and ECG-based classification
settings and argue that the optimal classifier can be computed without access
to direct information from any of the individual source tasks. We conclude by
discussing further applications, limitations, and possible future directions.
( 2
min )
We study the consequences of mode-collapse of normalizing flows in the
context of lattice field theory. Normalizing flows allow for independent
sampling. For this reason, it is hoped that they can avoid the tunneling
problem of local-update MCMC algorithms for multi-modal distributions. In this
work, we first point out that the tunneling problem is also present for
normalizing flows but is shifted from the sampling to the training phase of the
algorithm. Specifically, normalizing flows often suffer from mode-collapse for
which the training process assigns vanishingly low probability mass to relevant
modes of the physical distribution. This may result in a significant bias when
the flow is used as a sampler in a Markov-Chain or with Importance Sampling. We
propose a metric to quantify the degree of mode-collapse and derive a bound on
the resulting bias. Furthermore, we propose various mitigation strategies in
particular in the context of estimating thermodynamic observables, such as the
free energy.
( 2
min )
This study addresses the problem of performing clustering in the presence of
two types of background knowledge: pairwise constraints and monotonicity
constraints. To achieve this, the formal framework to perform clustering under
monotonicity constraints is, firstly, defined, resulting in a specific distance
measure. Pairwise constraints are integrated afterwards by designing an
objective function which combines the proposed distance measure and a pairwise
constraint-based penalty term, in order to fuse both types of information. This
objective function can be optimized with an EM optimization scheme. The
proposed method serves as the first approach to the problem it addresses, as it
is the first method designed to work with the two types of background knowledge
mentioned above. Our proposal is tested in a variety of benchmark datasets and
in a real-world case of study.
( 2
min )
Automatic recommendation systems based on deep neural networks have become
extremely popular during the last decade. Some of these systems can however be
used for applications which are ranked as High Risk by the European Commission
in the A.I. act, as for instance for online job candidate recommendation. When
used in the European Union, commercial AI systems for this purpose will then be
required to have to proper statistical properties with regard to potential
discrimination they could engender. This motivated our contribution, where we
present a novel optimal transport strategy to mitigate undesirable algorithmic
biases in multi-class neural-network classification. Our stratey is model
agnostic and can be used on any multi-class classification neural-network
model. To anticipate the certification of recommendation systems using textual
data, we then used it on the Bios dataset, for which the learning task consists
in predicting the occupation of female and male individuals, based on their
LinkedIn biography. Results show that it can reduce undesired algorithmic
biases in this context to lower levels than a standard strategy.
( 2
min )
We introduce a new methodology dubbed ``safe peeling'' to accelerate the
resolution of l0-regularized least-squares problems via a Branch-and-Bound
(BnB) method. Our procedure enables to tighten the convex relaxation considered
at each node of the BnB decision tree and therefore potentially allows for more
aggressive pruning. Numerical simulations show that our proposed methodology
leads to significant gains in terms of number of nodes explored and overall
solving time.
( 2
min )
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.
( 2
min )
Bayesian experimental design (BED) provides a powerful and general framework
for optimizing the design of experiments. However, its deployment often poses
substantial computational challenges that can undermine its practical use. In
this review, we outline how recent advances have transformed our ability to
overcome these challenges and thus utilize BED effectively, before discussing
some key areas for future development in the field.
( 2
min )
We consider the problem of tracking an unknown time varying parameter that
characterizes the probabilistic evolution of a sequence of independent
observations. To this aim, we propose a stochastic gradient descent-based
recursive scheme in which the log-likelihood of the observations acts as time
varying gain function. We prove convergence in mean-square error in a suitable
neighbourhood of the unknown time varying parameter and illustrate the details
of our findings in the case where data are generated from distributions
belonging to the exponential family.
( 2
min )
Self-supervised learning has significantly improved the performance of many
NLP tasks. However, how can self-supervised learning discover useful
representations, and why is it better than traditional approaches such as
probabilistic models are still largely unknown. In this paper, we focus on the
context of topic modeling and highlight a key advantage of self-supervised
learning - when applied to data generated by topic models, self-supervised
learning can be oblivious to the specific model, and hence is less susceptible
to model misspecification. In particular, we prove that commonly used
self-supervised objectives based on reconstruction or contrastive samples can
both recover useful posterior information for general topic models.
Empirically, we show that the same objectives can perform on par with posterior
inference using the correct model, while outperforming posterior inference
using misspecified models.
( 2
min )
In an effort to address the training instabilities of GANs, we introduce a
class of dual-objective GANs with different value functions (objectives) for
the generator (G) and discriminator (D). In particular, we model each objective
using $\alpha$-loss, a tunable classification loss, to obtain
$(\alpha_D,\alpha_G)$-GANs, parameterized by $(\alpha_D,\alpha_G)\in
[0,\infty)^2$. For sufficiently large number of samples and capacities for G
and D, we show that the resulting non-zero sum game simplifies to minimizing an
$f$-divergence under appropriate conditions on $(\alpha_D,\alpha_G)$. In the
finite sample and capacity setting, we define estimation error to quantify the
gap in the generator's performance relative to the optimal setting with
infinite samples and obtain upper bounds on this error, showing it to be order
optimal under certain conditions. Finally, we highlight the value of tuning
$(\alpha_D,\alpha_G)$ in alleviating training instabilities for the synthetic
2D Gaussian mixture ring and the Stacked MNIST datasets.
( 2
min )
Estimation of the complete distribution of a random variable is a useful
primitive for both manual and automated decision making. This problem has
received extensive attention in the i.i.d. setting, but the arbitrary data
dependent setting remains largely unaddressed. Consistent with known
impossibility results, we present computationally felicitous time-uniform and
value-uniform bounds on the CDF of the running averaged conditional
distribution of a real-valued random variable which are always valid and
sometimes trivial, along with an instance-dependent convergence guarantee. The
importance-weighted extension is appropriate for estimating complete
counterfactual distributions of rewards given controlled experimentation data
exhaust, e.g., from an A/B test or a contextual bandit.
( 2
min )
Temporal distributional shifts, with underlying dynamics changing over time,
frequently occur in real-world time series and pose a fundamental challenge for
deep neural networks (DNNs). In this paper, we propose a novel deep sequence
model based on the Koopman theory for time series forecasting: Koopman Neural
Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the
coefficients of chosen measurement functions. KNF imposes appropriate inductive
biases for improved robustness against distributional shifts, employing both a
global operator to learn shared characteristics and a local operator to capture
changing dynamics, as well as a specially-designed feedback loop to
continuously update the learned operators over time for rapidly varying
behaviors. We demonstrate that \ours{} achieves superior performance compared
to the alternatives, on multiple time series datasets that are shown to suffer
from distribution shifts.
( 2
min )
Forest-based methods have recently gained in popularity for non-parametric
treatment effect estimation. Building on this line of work, we introduce causal
survival forests, which can be used to estimate heterogeneous treatment effects
in a survival and observational setting where outcomes may be right-censored.
Our approach relies on orthogonal estimating equations to robustly adjust for
both censoring and selection effects under unconfoundedness. In our
experiments, we find our approach to perform well relative to a number of
baselines.
( 2
min )
Automatic recommendation systems based on deep neural networks have become
extremely popular during the last decade. Some of these systems can however be
used for applications which are ranked as High Risk by the European Commission
in the A.I. act, as for instance for online job candidate recommendation. When
used in the European Union, commercial AI systems for this purpose will then be
required to have to proper statistical properties with regard to potential
discrimination they could engender. This motivated our contribution, where we
present a novel optimal transport strategy to mitigate undesirable algorithmic
biases in multi-class neural-network classification. Our stratey is model
agnostic and can be used on any multi-class classification neural-network
model. To anticipate the certification of recommendation systems using textual
data, we then used it on the Bios dataset, for which the learning task consists
in predicting the occupation of female and male individuals, based on their
LinkedIn biography. Results show that it can reduce undesired algorithmic
biases in this context to lower levels than a standard strategy.
( 2
min )
Kernel methods, being supported by a well-developed theory and coming with
efficient algorithms, are among the most popular and successful machine
learning techniques. From a mathematical point of view, these methods rest on
the concept of kernels and function spaces generated by kernels, so called
reproducing kernel Hilbert spaces. Motivated by recent developments of learning
approaches in the context of interacting particle systems, we investigate
kernel methods acting on data with many measurement variables. We show the
rigorous mean field limit of kernels and provide a detailed analysis of the
limiting reproducing kernel Hilbert space. Furthermore, several examples of
kernels, that allow a rigorous mean field limit, are presented.
( 2
min )
Semi-supervised learning aims to train a model using limited labels.
State-of-the-art semi-supervised methods for image classification such as PAWS
rely on self-supervised representations learned with large-scale unlabeled but
curated data. However, PAWS is often less effective when using real-world
unlabeled data that is uncurated, e.g., contains out-of-class data. We propose
RoPAWS, a robust extension of PAWS that can work with real-world unlabeled
data. We first reinterpret PAWS as a generative classifier that models
densities using kernel density estimation. From this probabilistic perspective,
we calibrate its prediction based on the densities of labeled and unlabeled
data, which leads to a simple closed-form solution from the Bayes' rule. We
demonstrate that RoPAWS significantly improves PAWS for uncurated Semi-iNat by
+5.3% and curated ImageNet by +0.4%.
( 2
min )
Partitioning a set of elements into subsets of a priori unknown sizes is
essential in many applications. These subset sizes are rarely explicitly
learned - be it the cluster sizes in clustering applications or the number of
shared versus independent generative latent factors in weakly-supervised
learning. Probability distributions over correct combinations of subset sizes
are non-differentiable due to hard constraints, which prohibit gradient-based
optimization. In this work, we propose the differentiable hypergeometric
distribution. The hypergeometric distribution models the probability of
different group sizes based on their relative importance. We introduce
reparameterizable gradients to learn the importance between groups and
highlight the advantage of explicitly learning the size of subsets in two
typical applications: weakly-supervised learning and clustering. In both
applications, we outperform previous approaches, which rely on suboptimal
heuristics to model the unknown size of groups.
( 2
min )
The most recent multi-source covariate shift algorithm is an efficient
hyperparameter optimization algorithm for missing target output. In this paper,
we extend this algorithm to the framework of federated learning. For data
islands in federated learning and covariate shift adaptation, we propose the
federated domain adaptation estimate of the target risk which is asymptotically
unbiased with a desirable asymptotic variance property. We construct a weighted
model for the target task and propose the federated covariate shift adaptation
algorithm which works preferably in our setting. The efficacy of our method is
justified both theoretically and empirically.
( 2
min )
This paper introduces a new framework of algebraic equivalence relations
between time series and new distance metrics between them, then applies these
to investigate the Australian ``Black Summer'' bushfire season of 2019-2020.
First, we introduce a general framework for defining equivalence between time
series, heuristically intended to be equivalent if they differ only up to
noise. Our first specific implementation is based on using change point
algorithms and comparing statistical quantities such as mean or variance in
stationary segments. We thus derive the existence of such equivalence relations
on the space of time series, such that the quotient spaces can be equipped with
a metrizable topology. Next, we illustrate specifically how to define and
compute such distances among a collection of time series and perform clustering
and additional analysis thereon. Then, we apply these insights to analyze air
quality data across New South Wales, Australia, during the 2019-2020 bushfires.
There, we investigate structural similarity with respect to this data and
identify locations that were impacted anonymously by the fires relative to
their location. This may have implications regarding the appropriate management
of resources to avoid gaps in the defense against future fires.
( 2
min )
Traffic systems can operate in different modes. In a previous work, we
identified these modes as different quasi-stationary states in the correlation
structure. Here, we analyze the transitions between such quasi-stationary
states, i.e., how the system changes its operational mode. In the longer run
this might be helpful to forecast the time evolution of correlation patterns in
traffic. We take Cologne orbital motorways as an example, we construct a state
transition network for each quarter of 2015 and find a seasonal dependence for
those quasi-stationary states in the traffic system. Using the PageRank
algorithm, we identify and explore the dominant states which occur frequently
within a moving time window of 60 days in 2015. To the best of our knowledge,
this is the first study of this type for traffic systems.
( 2
min )
Clustering is a widely used technique with a long and rich history in a
variety of areas. However, most existing algorithms do not scale well to large
datasets, or are missing theoretical guarantees of convergence. This paper
introduces a provably robust clustering algorithm based on loss minimization
that performs well on Gaussian mixture models with outliers. It provides
theoretical guarantees that the algorithm obtains high accuracy with high
probability under certain assumptions. Moreover, it can also be used as an
initialization strategy for $k$-means clustering. Experiments on real-world
large-scale datasets demonstrate the effectiveness of the algorithm when
clustering a large number of clusters, and a $k$-means algorithm initialized by
the algorithm outperforms many of the classic clustering methods in both speed
and accuracy, while scaling well to large datasets such as ImageNet.
( 2
min )
We propose a class of models based on Fisher's Linear Discriminant (FLD) in
the context of domain adaptation. The class is the convex combination of two
hypotheses: i) an average hypothesis representing previously seen source tasks
and ii) a hypothesis trained on a new target task. For a particular generative
setting we derive the optimal convex combination of the two models under 0-1
loss, propose a computable approximation, and study the effect of various
parameter settings on the relative risks between the optimal hypothesis,
hypothesis i), and hypothesis ii). We demonstrate the effectiveness of the
proposed optimal classifier in the context of EEG- and ECG-based classification
settings and argue that the optimal classifier can be computed without access
to direct information from any of the individual source tasks. We conclude by
discussing further applications, limitations, and possible future directions.
( 2
min )
submitted by /u/Adunaiii
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/sediba-edud-eht
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 41
min )
submitted by /u/DevOpsMuffin39
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/bukowski3000
[link] [comments]
( 41
min )
submitted by /u/chronck
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
submitted by /u/MsNunez
[link] [comments]
( 41
min )
It would be something similar to mnist-ready (https://github.com/saoj/mnist-ready) in Ruby, but in Python. See below:
digit = MNIST.all_set[0] # first one # An integer corresponding to the digit of the image puts digit.label # => 7 # The pixels is an one-dimension array of 784 (28 x 28) pixel values from 0 to 255 puts digit.pixels.size # => 784 puts digit.pixels.inspect # => [0, 0, 0, 0, ...
It has this nice feature which allows you to see the digits:
puts digit.ascii_image ____________________________ | 7 | |----------------------------| | | | }wJY+I | | #$$$$%ddddddddQ> | | -f?fCM$M$$$$W$$c | | _^---~"8$/ | | }$h | | "&$} | | n$8! | | ~@$+ | | u$w. | | `k@~ | | x$m | | ]$%~ | | #$L | | .k$*I | | l$$] | | ;#$f | | u$$> | | +%$$> | | r$$*l | | r$h | |____________________________|
submitted by /u/niosurfer
[link] [comments]
( 43
min )
Hi everyone. Now ChatRWKV v2 can split RWKV to multiple GPUs, or stream layers (compute layer-by-layer), so you can run RWKV 14B with as few as 3G VRAM. https://github.com/BlinkDL/ChatRWKV
Example:
'cuda:0 fp16 *10 -> cuda:1 fp16 *8 -> cpu fp32' = first 10 layers on cuda:0 fp16, then 8 layers on cuda:1 fp16, then on cpu fp32
'cuda fp16 *20+' = first 20 layers on cuda fp16, then stream the rest on it
And RWKV is now a pip package: https://pypi.org/project/rwkv/
os.environ['RWKV_JIT_ON'] = '1' os.environ["RWKV_CUDA_ON"] = '0' # if '1' then compile CUDA kernel for seq mode (much faster) from rwkv.model import RWKV from rwkv.utils import PIPELINE, PIPELINE_ARGS pipeline = PIPELINE(model, "20B_tokenizer.json") # find it in https://github.com/BlinkDL/ChatRWKV # download models: https://hugg…
( 45
min )
The fashion industry is a highly lucrative business, with an estimated value of $2.1 trillion by 2025, as reported by the World Bank. This field encompasses a diverse range of segments, such as the creation, manufacture, distribution, and sales of clothing, shoes, and accessories. The industry is in a constant state of change, with new […]
( 15
min )
This post is co-written with Suhyoung Kim, General Manager at KakaoGames Data Analytics Lab. Kakao Games is a top video game publisher and developer headquartered in South Korea. It specializes in developing and publishing games on PC, mobile, and virtual reality (VR) serving globally. In order to maximize its players’ experience and improve the efficiency […]
( 14
min )
Amazon Comprehend is a managed AI service that uses natural language processing (NLP) with ready-made intelligence to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. The ability to train custom models through the Custom classification and Custom entity […]
( 10
min )
The world we live in is rapidly changing, and so are the data and features that companies and customers use to train their models. Retraining models to keep them in sync with these changes is critical to maintain accuracy. Therefore, you need an agile and dynamic approach to keep models up to date and adapt […]
( 10
min )
submitted by /u/Yasiru92
[link] [comments]
( 41
min )
submitted by /u/FettyZ
[link] [comments]
( 42
min )
The quest for knowledge at work can feel like searching for a needle in a haystack. But what if the haystack itself could reveal where the needle is? That’s the promise of large language models, or LLMs, the subject of this week’s episode of the NVIDIA AI Podcast featuring Deedy Das and Eddie Zhou, founding Read article >
( 5
min )
submitted by /u/yachay_ai
[link] [comments]
( 41
min )
submitted by /u/virtual_transject
[link] [comments]
( 41
min )
submitted by /u/pyactee
[link] [comments]
( 41
min )
Please provide feedback so I can make it better and help the AI movement.
aitoptools.com
submitted by /u/aitoptools
[link] [comments]
( 41
min )
submitted by /u/turtlepajama
[link] [comments]
( 42
min )
submitted by /u/TemplarTV
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
submitted by /u/ElonJuniorMusk
[link] [comments]
( 42
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/Fusemachines_1
[link] [comments]
( 41
min )
submitted by /u/grahammiranda13
[link] [comments]
( 41
min )
submitted by /u/DPC_1
[link] [comments]
( 42
min )
submitted by /u/davinci-code
[link] [comments]
( 41
min )
Announcements Are Generative Adversarial Networks Really Useful? Such a question may seem as coming from a dinosaur, adverse to change. Or from someone selling traditional methods and badmouthing anything that feels threatening to his business. This is not the case here: I always try to stay neutral, and usually – while typically not a first… Read More »DSC Weekly 28 February 2023 – Generative Adversarial Networks (GANs): Are They Really Useful?
The post DSC Weekly 28 February 2023 – Generative Adversarial Networks (GANs): Are They Really Useful? appeared first on Data Science Central.
( 21
min )
Back in 2018, I had the privilege of keynoting at one of Semantic Web Company’s events in Vienna, as well as attending the full event. It was a great opportunity to immerse myself in the Central European perspective on the utility of Linked Open Data standards and how those standards were being applied. I got… Read More »FAIR Content: Better Chatbot Answers and Content Reusability at Scale
The post FAIR Content: Better Chatbot Answers and Content Reusability at Scale appeared first on Data Science Central.
( 21
min )
submitted by /u/shani_786
[link] [comments]
( 41
min )
In today’s highly competitive market, performing data analytics using machine learning (ML) models has become a necessity for organizations. It enables them to unlock the value of their data, identify trends, patterns, and predictions, and differentiate themselves from their competitors. For example, in the healthcare industry, ML-driven analytics can be used for diagnostic assistance and […]
( 12
min )
Fraud detection is an important problem that has applications in financial services, social media, ecommerce, gaming, and other industries. This post presents an implementation of a fraud detection solution using the Relational Graph Convolutional Network (RGCN) model to predict the probability that a transaction is fraudulent through both the transductive and inductive inference modes. You can deploy our implementation to an Amazon SageMaker endpoint as a real-time fraud detection solution, without requiring external graph storage or orchestration, thereby significantly reducing the deployment cost of the model.
( 11
min )
As the meteoric rise of ChatGPT demonstrates, generative AI can unlock enormous potential for companies, teams and individuals. Whether simplifying time-consuming tasks or accelerating 3D workflows to boost creativity and productivity, generative AI is already making an impact across industries — and there’s much more to come. How generative AI is paving the way for Read article >
( 5
min )
Brian Spears says his children will enjoy a more sustainable planet, thanks in part to AI and high performance computing (HPC) simulations. “I believe I’ll see fusion energy in my lifetime, and I’m confident my daughters will see a fusion-powered world,” said the 45-year-old principal investigator at Lawrence Livermore National Laboratory who helped demonstrate the Read article >
( 6
min )
ManvsMachine steps In the NVIDIA Studio this week to share insights behind fractal art — which uses algorithms to artistically represent calculations — derived from geometric objects as digital images and animations.
( 6
min )
Streaming video on PCs through Google Chrome and Microsoft Edge browsers is getting a GeForce RTX-sized upgrade today with the release of RTX Video Super Resolution (VSR). Nearly 80% of internet bandwidth today is streaming video. And 90% of that content streams at 1080p or lower, including from popular sources like Twitch.tv, YouTube, Netflix, Disney+ Read article >
( 6
min )
Inferring causal structure from data is a challenging task of fundamental
importance in science. Observational data are often insufficient to identify a
system's causal structure uniquely. While conducting interventions (i.e.,
experiments) can improve the identifiability, such samples are usually
challenging and expensive to obtain. Hence, experimental design approaches for
causal discovery aim to minimize the number of interventions by estimating the
most informative intervention target. In this work, we propose a novel
Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts'
the gradient estimator of a gradient-based causal discovery framework to
provide signals for the intervention acquisition function. We provide extensive
experiments in simulated and real-world datasets and demonstrate that GIT
performs on par with competitive baselines, surpassing them in the low-data
regime.
( 2
min )
In this work, we propose a self-improving artificial intelligence system to
enhance the safety performance of reinforcement learning (RL)-based autonomous
driving (AD) agents using black-box verification methods. RL algorithms have
become popular in AD applications in recent years. However, the performance of
existing RL algorithms heavily depends on the diversity of training scenarios.
A lack of safety-critical scenarios during the training phase could result in
poor generalization performance in real-world driving applications. We propose
a novel framework in which the weaknesses of the training set are explored
through black-box verification methods. After discovering AD failure scenarios,
the RL agent's training is re-initiated via transfer learning to improve the
performance of previously unsafe scenarios. Simulation results demonstrate that
our approach efficiently discovers safety failures of action decisions in
RL-based adaptive cruise control (ACC) applications and significantly reduces
the number of vehicle collisions through iterative applications of our method.
The source code is publicly available at
https://github.com/data-and-decision-lab/self-improving-RL.
( 2
min )
In the end-of-line test of geared motors, the evaluation of product qual-ity
is important. Due to time constraints and the high diversity of variants,
acous-tic measurements are more economical than vibration measurements.
However, the acoustic data is affected by industrial disturbing noise.
Therefore, the aim of this study is to investigate the robustness of features
used for anomaly detection in geared motor end-of-line testing. A real-world
dataset with typical faults and acoustic disturbances is recorded by an
acoustic array. This includes industrial noise from the production and
systematically produced disturbances, used to compare the robustness. Overall,
it is proposed to apply features extracted from a log-envelope spectrum
together with psychoacoustic features. The anomaly de-tection is done by using
the isolation forest or the more universal bagging random miner. Most
disturbances can be circumvented, while the use of a hammer or air pressure
often causes problems. In general, these results are important for condi-tion
monitoring tasks that are based on acoustic or vibration measurements.
Fur-thermore, a real-world problem description is presented to improve common
sig-nal processing and machine learning tasks.
( 2
min )
The recent literature on online learning to rank (LTR) has established the
utility of prior knowledge to Bayesian ranking bandit algorithms. However, a
major limitation of existing work is the requirement for the prior used by the
algorithm to match the true prior. In this paper, we propose and analyze
adaptive algorithms that address this issue and additionally extend these
results to the linear and generalized linear models. We also consider scalar
relevance feedback on top of click feedback. Moreover, we demonstrate the
efficacy of our algorithms using both synthetic and real-world experiments.
( 2
min )
Research on deep reinforcement learning (DRL) based production scheduling
(PS) has gained a lot of attention in recent years, primarily due to the high
demand for optimizing scheduling problems in diverse industry settings.
Numerous studies are carried out and published as stand-alone experiments that
often vary only slightly with respect to problem setups and solution
approaches. The programmatic core of these experiments is typically very
similar. Despite this fact, no standardized and resilient framework for
experimentation on PS problems with DRL algorithms could be established so far.
In this paper, we introduce schlably, a Python-based framework that provides
researchers a comprehensive toolset to facilitate the development of PS
solution strategies based on DRL. schlably eliminates the redundant overhead
work that the creation of a sturdy and flexible backbone requires and increases
the comparability and reusability of conducted research work.
( 2
min )
Distributed deep learning (DDL) systems strongly depend on network
performance. Current electronic packet switched (EPS) network architectures and
technologies suffer from variable diameter topologies, low-bisection bandwidth
and over-subscription affecting completion time of communication and collective
operations.
We introduce a near-exascale, full-bisection bandwidth, all-to-all,
single-hop, all-optical network architecture with nanosecond reconfiguration
called RAMP, which supports large-scale distributed and parallel computing
systems (12.8~Tbps per node for up to 65,536 nodes).
For the first time, a custom RAMP-x MPI strategy and a network transcoder is
proposed to run MPI collective operations across the optical circuit switched
(OCS) network in a schedule-less and contention-less manner. RAMP achieves
7.6-171$\times$ speed-up in completion time across all MPI operations compared
to realistic EPS and OCS counterparts. It can also deliver a 1.3-16$\times$ and
7.8-58$\times$ reduction in Megatron and DLRM training time respectively} while
offering 42-53$\times$ and 3.3-12.4$\times$ improvement in energy consumption
and cost respectively.
( 2
min )
In the context of keyword spotting (KWS), the replacement of handcrafted
speech features by learnable features has not yielded superior KWS performance.
In this study, we demonstrate that filterbank learning outperforms handcrafted
speech features for KWS whenever the number of filterbank channels is severely
decreased. Reducing the number of channels might yield certain KWS performance
drop, but also a substantial energy consumption reduction, which is key when
deploying common always-on KWS on low-resource devices. Experimental results on
a noisy version of the Google Speech Commands Dataset show that filterbank
learning adapts to noise characteristics to provide a higher degree of
robustness to noise, especially when dropout is integrated. Thus, switching
from typically used 40-channel log-Mel features to 8-channel learned features
leads to a relative KWS accuracy loss of only 3.5% while simultaneously
achieving a 6.3x energy consumption reduction.
( 2
min )
The imputation of missing values represents a significant obstacle for many
real-world data analysis pipelines. Here, we focus on time series data and put
forward SSSD, an imputation model that relies on two emerging technologies,
(conditional) diffusion models as state-of-the-art generative models and
structured state space models as internal model architecture, which are
particularly suited to capture long-term dependencies in time series data. We
demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic
imputation and forecasting performance on a broad range of data sets and
different missingness scenarios, including the challenging blackout-missing
scenarios, where prior approaches failed to provide meaningful results.
( 2
min )
In this paper, we study first-order algorithms for solving fully composite
optimization problems over bounded sets. We treat the differentiable and
non-differentiable parts of the objective separately, linearizing only the
smooth components. This provides us with new generalizations of the classical
and accelerated Frank-Wolfe methods, that are applicable to non-differentiable
problems whenever we can access the structure of the objective. We prove global
complexity bounds for our algorithms that are optimal in several settings.
( 2
min )
This paper describes our participation in SemEval-2023 Task 9, Intimacy
Analysis of Multilingual Tweets. We fine-tune some of the most popular
transformer models with the training dataset and synthetic data generated by
different data augmentation techniques. During the development phase, our best
results were obtained by using XLM-T. Data augmentation techniques provide a
very slight improvement in the results. Our system ranked in the 27th position
out of the 45 participating systems. Despite its modest results, our system
shows promising results in languages such as Portuguese, English, and Dutch.
All our code is available in the repository
\url{https://github.com/isegura/hulat_intimacy}.
( 2
min )
We study the problem of inferring heterogeneous treatment effects (HTEs) from
time-to-event data in the presence of competing events. Albeit its great
practical relevance, this problem has received little attention compared to its
counterparts studying HTE estimation without time-to-event data or competing
events. We take an outcome modeling approach to estimating HTEs, and consider
how and when existing prediction models for time-to-event data can be used as
plug-in estimators for potential outcomes. We then investigate whether
competing events present new challenges for HTE estimation -- in addition to
the standard confounding problem --, and find that, because there are multiple
definitions of causal effects in this setting -- namely total, direct and
separable effects --, competing events can act as an additional source of
covariate shift depending on the desired treatment effect interpretation and
associated estimand. We theoretically analyze and empirically illustrate when
and how these challenges play a role when using generic machine learning
prediction models for the estimation of HTEs.
( 2
min )
In this study, we validate the findings of previously published papers,
showing the feasibility of an Electroencephalography (EEG) based gaze
estimation. Moreover, we extend previous research by demonstrating that with
only a slight drop in model performance, we can significantly reduce the number
of electrodes, indicating that a high-density, expensive EEG cap is not
necessary for the purposes of EEG-based eye tracking. Using data-driven
approaches, we establish which electrode clusters impact gaze estimation and
how the different types of EEG data preprocessing affect the models'
performance. Finally, we also inspect which recorded frequencies are most
important for the defined tasks.
( 2
min )
In the present work, we introduce a novel approach to enhance the precision
of reduced order models by exploiting a multi-fidelity perspective and
DeepONets. Reduced models provide a real-time numerical approximation by
simplifying the original model. The error introduced by such operation is
usually neglected and sacrificed in order to reach a fast computation. We
propose to couple the model reduction to a machine learning residual learning,
such that the above-mentioned error can be learnt by a neural network and
inferred for new predictions. We emphasize that the framework maximizes the
exploitation of the high-fidelity information, using it for building the
reduced order model and for learning the residual. In this work we explore the
integration of proper orthogonal decomposition (POD), and gappy POD for sensors
data, with the recent DeepONet architecture. Numerical investigations for a
parametric benchmark function and a nonlinear parametric Navier-Stokes problem
are presented.
( 2
min )
Federated learning (FL) was originally regarded as a framework for
collaborative learning among clients with data privacy protection through a
coordinating server. In this paper, we propose a new active membership
inference (AMI) attack carried out by a dishonest server in FL. In AMI attacks,
the server crafts and embeds malicious parameters into global models to
effectively infer whether a target data sample is included in a client's
private training data or not. By exploiting the correlation among data features
through a non-linear decision boundary, AMI attacks with a certified guarantee
of success can achieve severely high success rates under rigorous local
differential privacy (LDP) protection; thereby exposing clients' training data
to significant privacy risk. Theoretical and experimental results on several
benchmark datasets show that adding sufficient privacy-preserving noise to
prevent our attack would significantly damage FL's model utility.
( 2
min )
Accurate and real-time traffic state prediction is of great practical
importance for urban traffic control and web mapping services (e.g. Google
Maps). With the support of massive data, deep learning methods have shown their
powerful capability in capturing the complex spatio-temporal patterns of road
networks. However, existing approaches use independent components to model
temporal and spatial dependencies and thus ignore the heterogeneous
characteristics of traffic flow that vary with time and space. In this paper,
we propose a novel dynamic graph convolution network with spatio-temporal
attention fusion. The method not only captures local spatio-temporal
information that changes over time, but also comprehensively models
long-distance and multi-scale spatio-temporal patterns based on the fusion
mechanism of temporal and spatial attention. This design idea can greatly
improve the spatio-temporal perception of the model. We conduct extensive
experiments in 4 real-world datasets to demonstrate that our model achieves
state-of-the-art performance compared to 22 baseline models.
( 2
min )
To address the problem of medical image recognition, computer vision
techniques like convolutional neural networks (CNN) are frequently used.
Recently, 3D CNN-based models dominate the field of magnetic resonance image
(MRI) analytics. Due to the high similarity between MRI data and videos, we
conduct extensive empirical studies on video recognition techniques for MRI
classification to answer the questions: (1) can we directly use video
recognition models for MRI classification, (2) which model is more appropriate
for MRI, (3) are the common tricks like data augmentation in video recognition
still useful for MRI classification? Our work suggests that advanced video
techniques benefit MRI classification. In this paper, four datasets of
Alzheimer's and Parkinson's disease recognition are utilized in experiments,
together with three alternative video recognition models and data augmentation
techniques that are frequently applied to video tasks. In terms of efficiency,
the results reveal that the video framework performs better than 3D-CNN models
by 5% - 11% with 50% - 66% less trainable parameters. This report pushes
forward the potential fusion of 3D medical imaging and video understanding
research.
( 2
min )
Despite the major progress of deep models as learning machines, uncertainty
estimation remains a major challenge. Existing solutions rely on modified loss
functions or architectural changes. We propose to compensate for the lack of
built-in uncertainty estimates by supplementing any network, retrospectively,
with a subsequent vine copula model, in an overall compound we call Vine-Copula
Neural Network (VCNN). Through synthetic and real-data experiments, we show
that VCNNs could be task (regression/classification) and architecture
(recurrent, fully connected) agnostic while providing reliable and
better-calibrated uncertainty estimates, comparable to state-of-the-art
built-in uncertainty solutions.
( 2
min )
This paper presents a novel approach for multimodal data fusion based on the
Vector-Quantized Variational Autoencoder (VQVAE) architecture. The proposed
method is simple yet effective in achieving excellent reconstruction
performance on paired MNIST-SVHN data and WiFi spectrogram data. Additionally,
the multimodal VQVAE model is extended to the 5G communication scenario, where
an end-to-end Channel State Information (CSI) feedback system is implemented to
compress data transmitted between the base-station (eNodeB) and User Equipment
(UE), without significant loss of performance. The proposed model learns a
discriminative compressed feature space for various types of input data (CSI,
spectrograms, natural images, etc), making it a suitable solution for
applications with limited computational resources.
( 2
min )
To accelerate the inference of deep neural networks (DNNs), quantization with
low-bitwidth numbers is actively researched. A prominent challenge is to
quantize the DNN models into low-bitwidth numbers without significant accuracy
degradation, especially at very low bitwidths (< 8 bits). This work targets an
adaptive data representation with variable-length encoding called DyBit. DyBit
can dynamically adjust the precision and range of separate bit-field to be
adapted to the DNN weights/activations distribution. We also propose a
hardware-aware quantization framework with a mixed-precision accelerator to
trade-off the inference accuracy and speedup. Experimental results demonstrate
that the inference accuracy via DyBit is 1.997% higher than the
state-of-the-art at 4-bit quantization, and the proposed framework can achieve
up to 8.1x speedup compared with the original model.
( 2
min )
We study differentially private (DP) machine learning algorithms as instances
of noisy fixed-point iterations, in order to derive privacy and utility results
from this well-studied framework. We show that this new perspective recovers
popular private gradient-based methods like DP-SGD and provides a principled
way to design and analyze new private optimization algorithms in a flexible
manner. Focusing on the widely-used Alternating Directions Method of
Multipliers (ADMM) method, we use our general framework to derive novel private
ADMM algorithms for centralized, federated and fully decentralized learning.
For these three algorithms, we establish strong privacy guarantees leveraging
privacy amplification by iteration and by subsampling. Finally, we provide
utility guarantees using a unified analysis that exploits a recent linear
convergence result for noisy fixed-point iterations.
( 2
min )
Recent advancements in interpretability research made transformer language
models more transparent. This progress led to a better understanding of their
inner workings for toy and naturally occurring models. However, how these
models internally process sentiment changes has yet to be sufficiently
answered. In this work, we introduce a new interpretability tool called PCP
ablation, where we replace modules with low-rank matrices based on the
principal components of their activations, reducing model parameters and their
behavior to essentials. We demonstrate PCP ablations on MLP and attention
layers in backdoored toy, backdoored large, and naturally occurring models. We
determine MLPs as most important for the backdoor mechanism and use this
knowledge to remove, insert, and modify backdoor mechanisms with engineered
replacements via PCP ablation.
( 2
min )
We prove that the set of functions representable by ReLU neural networks with
integer weights strictly increases with the network depth while allowing
arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden
layers are indeed necessary to compute the maximum of $n$ numbers, matching
known upper bounds. Our results are based on the known duality between neural
networks and Newton polytopes via tropical geometry. The integrality assumption
implies that these Newton polytopes are lattice polytopes. Then, our depth
lower bounds follow from a parity argument on the normalized volume of faces of
such polytopes.
( 2
min )
Morphological atlases are an important tool in organismal studies, and modern
high-throughput Computed Tomography (CT) facilities can produce hundreds of
full-body high-resolution volumetric images of organisms. However, creating an
atlas from these volumes requires accurate organ segmentation. In the last
decade, machine learning approaches have achieved incredible results in image
segmentation tasks, but they require large amounts of annotated data for
training. In this paper, we propose a self-training framework for multi-organ
segmentation in tomographic images of Medaka fish. We utilize the
pseudo-labeled data from a pretrained Teacher model and adopt a Quality
Classifier to refine the pseudo-labeled data. Then, we introduce a pixel-wise
knowledge distillation method to prevent overfitting to the pseudo-labeled data
and improve the segmentation performance. The experimental results demonstrate
that our method improves mean Intersection over Union (IoU) by 5.9% on the full
dataset and enables keeping the quality while using three times less markup.
( 2
min )
Studies involving both randomized experiments as well as observational data
typically involve time-to-event outcomes such as time-to-failure, death or
onset of an adverse condition. Such outcomes are typically subject to censoring
due to loss of follow-up and established statistical practice involves
comparing treatment efficacy in terms of hazard ratios between the treated and
control groups. In this paper we propose a statistical approach to recovering
sparse phenogroups (or subtypes) that demonstrate differential treatment
effects as compared to the study population. Our approach involves modelling
the data as a mixture while enforcing parameter shrinkage through structured
sparsity regularization. We propose a novel inference procedure for the
proposed model and demonstrate its efficacy in recovering sparse phenotypes
across large landmark real world clinical studies in cardiovascular health.
( 2
min )
Previous pitch-controllable text-to-speech (TTS) models rely on directly
modeling fundamental frequency, leading to low variance in synthesized speech.
To address this issue, we propose PITS, an end-to-end pitch-controllable TTS
model that utilizes variational inference to model pitch. Based on VITS, PITS
incorporates the Yingram encoder, the Yingram decoder, and adversarial training
of pitch-shifted synthesis to achieve pitch-controllability. Experiments
demonstrate that PITS generates high-quality speech that is indistinguishable
from ground truth speech and has high pitch-controllability without quality
degradation. Code and audio samples will be available at
https://github.com/anonymous-pits/pits.
( 2
min )
Effectively scaling large Transformer models is a main driver of recent
advances in natural language processing. Dynamic neural networks, as an
emerging research direction, are capable of scaling up neural networks with
sub-linear increases in computation and time by dynamically adjusting their
computational path based on the input. Dynamic neural networks could be a
promising solution to the growing parameter numbers of pretrained language
models, allowing both model pretraining with trillions of parameters and faster
inference on mobile devices. In this survey, we summarize progress of three
types of dynamic neural networks in NLP: skimming, mixture of experts, and
early exit. We also highlight current challenges in dynamic neural networks and
directions for future research.
( 2
min )
Contextual bandit algorithms often estimate reward models to inform
decision-making. However, true rewards can contain action-independent
redundancies that are not relevant for decision-making. We show it is more
data-efficient to estimate any function that explains the reward differences
between actions, that is, the treatment effects. Motivated by this observation,
building on recent work on oracle-based bandit algorithms, we provide the first
reduction of contextual bandits to general-purpose heterogeneous treatment
effect estimation, and we design a simple and computationally efficient
algorithm based on this reduction. Our theoretical and experimental results
demonstrate that heterogeneous treatment effect estimation in contextual
bandits offers practical advantages over reward estimation, including more
efficient model estimation and greater flexibility to model misspecification.
( 2
min )
Non-asymptotic statistical analysis is often missing for modern
geometry-aware machine learning algorithms due to the possibly intricate
non-linear manifold structure. This paper studies an intrinsic mean model on
the manifold of restricted positive semi-definite matrices and provides a
non-asymptotic statistical analysis of the Karcher mean. We also consider a
general extrinsic signal-plus-noise model, under which a deterministic error
bound of the Karcher mean is provided. As an application, we show that the
distributed principal component analysis algorithm, LRC-dPCA, achieves the same
performance as the full sample PCA algorithm. Numerical experiments lend strong
support to our theories.
( 2
min )
Traffic prediction is a flourishing research field due to its importance in
human mobility in the urban space. Despite this, existing studies only focus on
short-term prediction of up to few hours in advance, with most being up to one
hour only. Long-term traffic prediction can enable more comprehensive,
informed, and proactive measures against traffic congestion and is therefore an
important task to explore. In this paper, we explore the task of long-term
traffic prediction; where we predict traffic up to 24 hours in advance. We note
the weaknesses of existing models--which are based on recurrent structures--for
long-term traffic prediction and propose a modified Transformer model
``TrafFormer". Experiments comparing our model with existing hybrid neural
network models show the superiority of our model.
( 2
min )
In sponsored search advertising (SSA), keywords serve as the basic unit of
business model, linking three stakeholders: consumers, advertisers and search
engines. This paper presents an overarching framework for keyword decisions
that highlights the touchpoints in search advertising management, including
four levels of keyword decisions, i.e., domain-specific keyword pool
generation, keyword targeting, keyword assignment and grouping, and keyword
adjustment. Using this framework, we review the state-of-the-art research
literature on keyword decisions with respect to techniques, input features and
evaluation metrics. Finally, we discuss evolving issues and identify potential
gaps that exist in the literature and outline novel research perspectives for
future exploration.
( 2
min )
The cosmic microwave background (CMB) is a significant source of knowledge
about the origin and evolution of our universe. However, observations of the
CMB are contaminated by foreground emissions, obscuring the CMB signal and
reducing its efficacy in constraining cosmological parameters. We employ deep
learning as a data-driven approach to CMB cleaning from multi-frequency
full-sky maps. In particular, we develop a graph-based Bayesian convolutional
neural network based on the U-Net architecture that predicts cleaned CMB with
pixel-wise uncertainty estimates. We demonstrate the potential of this
technique on realistic simulated data based on the Planck mission. We show that
our model accurately recovers the cleaned CMB sky map and resulting angular
power spectrum while identifying regions of uncertainty. Finally, we discuss
the current challenges and the path forward for deploying our model for CMB
recovery on real observations.
( 2
min )
Modelling stockpile is a key factor of a project economic and operation in
mining, because not all the mined ores are not able to mill for many reasons.
Further, the financial value of the ore in the stockpile needs to be reflected
on the balance sheet. Therefore, automatically tracking the frontiers of the
stockpile facilitates the mine scheduling engineers to calculate the tonnage of
the ore remaining in the stockpile. This paper suggests how the dynamic of
stockpile shape changes caused by dumping and reclaiming operations can be
inferred using polygon models. The presented work also demonstrates how the
geometry of stockpiles can be inferred in the absence of reclaimed bucket
information, in which case the reclaim polygons are established using the
diggers GPS positional data at the time of truck loading. This work further
compares two polygon models for creating 2D shapes.
( 2
min )
Fast model updates for unseen tasks on intelligent edge devices are crucial
but also challenging due to the limited computational power. In this paper,we
propose MetaLDC, which meta-trains braininspired ultra-efficient
low-dimensional computing classifiers to enable fast adaptation on tiny devices
with minimal computational costs. Concretely, during the meta-training stage,
MetaLDC meta trains a representation offline by explicitly taking into account
that the final (binary) class layer will be fine-tuned for fast adaptation for
unseen tasks on tiny devices; during the meta-testing stage, MetaLDC uses
closed-form gradients of the loss function to enable fast adaptation of the
class layer. Unlike traditional neural networks, MetaLDC is designed based on
the emerging LDC framework to enable ultra-efficient on-device inference. Our
experiments have demonstrated that compared to SOTA baselines, MetaLDC achieves
higher accuracy, robustness against random bit errors, as well as
cost-efficient hardware computation.
( 2
min )
Since its introduction in 2017, physics-informed deep learning (PIDL) has
garnered growing popularity in understanding the evolution of systems governed
by physical laws in terms of partial differential equations (PDEs). However,
empirical evidence points to the limitations of PIDL for learning certain types
of PDEs. In this paper, we (a) present the challenges in training PIDL
architecture, (b) contrast the performance of PIDL architecture in learning a
first order scalar hyperbolic conservation law and its parabolic counterpart,
(c) investigate the effect of training data sampling, which corresponds to
various sensing scenarios in traffic networks, (d) comment on the implications
of PIDL limitations for traffic flow estimation and prediction in practice.
Detailed in the case study, we present the contradistinction in PIDL results
between learning the traffic flow model (LWR PDE) and its variation with
diffusion. The outcome indicates that PIDL experiences significant challenges
in learning the hyperbolic LWR equation due to the non-smoothness of its
solution. On the other hand, the architecture with parabolic PDE, augmented
with the diffusion term, leads to the successful reassembly of the density data
even with the shockwaves present.
( 2
min )
Federated learning (FL) is a popular technique for training a global model on
data distributed across client devices. Like other distributed training
techniques, FL is susceptible to straggler (slower or failed) clients. Recent
work has proposed to address this through device-to-device (D2D) offloading,
which introduces privacy concerns. In this paper, we propose a novel
straggler-optimal approach for coded matrix computations which can
significantly reduce the communication delay and privacy issues introduced from
D2D data transmissions in FL. Moreover, our proposed approach leads to a
considerable improvement of the local computation speed when the generated data
matrix is sparse. Numerical evaluations confirm the superiority of our proposed
method over baseline approaches.
( 2
min )
We revisit the original approach of using deep learning and neural networks
to solve differential equations by incorporating the knowledge of the equation.
This is done by adding a dedicated term to the loss function during the
optimization procedure in the training process. The so-called physics-informed
neural networks (PINNs) are tested on a variety of academic ordinary
differential equations in order to highlight the benefits and drawbacks of this
approach with respect to standard integration methods. We focus on the
possibility to use the least possible amount of data into the training process.
The principles of PINNs for solving differential equations by enforcing
physical laws via penalizing terms are reviewed. A tutorial on a simple
equation model illustrates how to put into practice the method for ordinary
differential equations. Benchmark tests show that a very small amount of
training data is sufficient to predict the solution when the non linearity of
the problem is weak. However, this is not the case in strongly non linear
problems where a priori knowledge of training data over some partial or the
whole time integration interval is necessary.
( 2
min )
Inferring causal structure from data is a challenging task of fundamental
importance in science. Observational data are often insufficient to identify a
system's causal structure uniquely. While conducting interventions (i.e.,
experiments) can improve the identifiability, such samples are usually
challenging and expensive to obtain. Hence, experimental design approaches for
causal discovery aim to minimize the number of interventions by estimating the
most informative intervention target. In this work, we propose a novel
Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts'
the gradient estimator of a gradient-based causal discovery framework to
provide signals for the intervention acquisition function. We provide extensive
experiments in simulated and real-world datasets and demonstrate that GIT
performs on par with competitive baselines, surpassing them in the low-data
regime.
( 2
min )
Contextual bandit algorithms often estimate reward models to inform
decision-making. However, true rewards can contain action-independent
redundancies that are not relevant for decision-making. We show it is more
data-efficient to estimate any function that explains the reward differences
between actions, that is, the treatment effects. Motivated by this observation,
building on recent work on oracle-based bandit algorithms, we provide the first
reduction of contextual bandits to general-purpose heterogeneous treatment
effect estimation, and we design a simple and computationally efficient
algorithm based on this reduction. Our theoretical and experimental results
demonstrate that heterogeneous treatment effect estimation in contextual
bandits offers practical advantages over reward estimation, including more
efficient model estimation and greater flexibility to model misspecification.
( 2
min )
The imputation of missing values represents a significant obstacle for many
real-world data analysis pipelines. Here, we focus on time series data and put
forward SSSD, an imputation model that relies on two emerging technologies,
(conditional) diffusion models as state-of-the-art generative models and
structured state space models as internal model architecture, which are
particularly suited to capture long-term dependencies in time series data. We
demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic
imputation and forecasting performance on a broad range of data sets and
different missingness scenarios, including the challenging blackout-missing
scenarios, where prior approaches failed to provide meaningful results.
( 2
min )
Bayesian additive regression trees (BART) is a semi-parametric regression
model offering state-of-the-art performance on out-of-sample prediction.
Despite this success, standard implementations of BART typically provide
inaccurate prediction and overly narrow prediction intervals at points outside
the range of the training data. This paper proposes a novel extrapolation
strategy that grafts Gaussian processes to the leaf nodes in BART for
predicting points outside the range of the observed data. The new method is
compared to standard BART implementations and recent frequentist
resampling-based methods for predictive inference. We apply the new approach to
a challenging problem from causal inference, wherein for some regions of
predictor space, only treated or untreated units are observed (but not both).
In simulation studies, the new approach boasts superior performance compared to
popular alternatives, such as Jackknife+.
( 2
min )
We study the problem of inferring heterogeneous treatment effects (HTEs) from
time-to-event data in the presence of competing events. Albeit its great
practical relevance, this problem has received little attention compared to its
counterparts studying HTE estimation without time-to-event data or competing
events. We take an outcome modeling approach to estimating HTEs, and consider
how and when existing prediction models for time-to-event data can be used as
plug-in estimators for potential outcomes. We then investigate whether
competing events present new challenges for HTE estimation -- in addition to
the standard confounding problem --, and find that, because there are multiple
definitions of causal effects in this setting -- namely total, direct and
separable effects --, competing events can act as an additional source of
covariate shift depending on the desired treatment effect interpretation and
associated estimand. We theoretically analyze and empirically illustrate when
and how these challenges play a role when using generic machine learning
prediction models for the estimation of HTEs.
( 2
min )
In this paper, we introduce two methods to solve the American-style option
pricing problem and its dual form at the same time using neural networks.
Without applying nested Monte Carlo, the first method uses a series of neural
networks to simultaneously compute both the lower and upper bounds of the
option price, and the second one accomplishes the same goal with one global
network. The avoidance of extra simulations and the use of neural networks
significantly reduce the computational complexity and allow us to price
Bermudan options with frequent exercise opportunities in high dimensions, as
illustrated by the provided numerical experiments. As a by-product, these
methods also derive a hedging strategy for the option, which can also be used
as a control variate for variance reduction.
( 2
min )
Non-asymptotic statistical analysis is often missing for modern
geometry-aware machine learning algorithms due to the possibly intricate
non-linear manifold structure. This paper studies an intrinsic mean model on
the manifold of restricted positive semi-definite matrices and provides a
non-asymptotic statistical analysis of the Karcher mean. We also consider a
general extrinsic signal-plus-noise model, under which a deterministic error
bound of the Karcher mean is provided. As an application, we show that the
distributed principal component analysis algorithm, LRC-dPCA, achieves the same
performance as the full sample PCA algorithm. Numerical experiments lend strong
support to our theories.
( 2
min )
A Shared Nearest Neighbor (SNN) graph is a type of graph construction using
shared nearest neighbor information, which is a secondary similarity measure
based on the rankings induced by a primary $k$-nearest neighbor ($k$-NN)
measure. SNN measures have been touted as being less prone to the curse of
dimensionality than conventional distance measures, and thus methods using SNN
graphs have been widely used in applications, particularly in clustering
high-dimensional data sets and in finding outliers in subspaces of high
dimensional data. Despite this, the theoretical study of SNN graphs and graph
Laplacians remains unexplored. In this pioneering work, we make the first
contribution in this direction. We show that large scale asymptotics of an SNN
graph Laplacian reach a consistent continuum limit; this limit is the same as
that of a $k$-NN graph Laplacian. Moreover, we show that the pointwise
convergence rate of the graph Laplacian is linear with respect to $(k/n)^{1/m}$
with high probability.
( 2
min )
We prove that the set of functions representable by ReLU neural networks with
integer weights strictly increases with the network depth while allowing
arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden
layers are indeed necessary to compute the maximum of $n$ numbers, matching
known upper bounds. Our results are based on the known duality between neural
networks and Newton polytopes via tropical geometry. The integrality assumption
implies that these Newton polytopes are lattice polytopes. Then, our depth
lower bounds follow from a parity argument on the normalized volume of faces of
such polytopes.
( 2
min )
Studies involving both randomized experiments as well as observational data
typically involve time-to-event outcomes such as time-to-failure, death or
onset of an adverse condition. Such outcomes are typically subject to censoring
due to loss of follow-up and established statistical practice involves
comparing treatment efficacy in terms of hazard ratios between the treated and
control groups. In this paper we propose a statistical approach to recovering
sparse phenogroups (or subtypes) that demonstrate differential treatment
effects as compared to the study population. Our approach involves modelling
the data as a mixture while enforcing parameter shrinkage through structured
sparsity regularization. We propose a novel inference procedure for the
proposed model and demonstrate its efficacy in recovering sparse phenotypes
across large landmark real world clinical studies in cardiovascular health.
( 2
min )
Hi everyone, I'm doing a personal project about what people think about music generating AIs. It will be very helpful if you take your time to do this survey. It will take about 5 minutes. Thank you so much for your participation.
https://docs.google.com/forms/d/e/1FAIpQLSfLHjRaWAsdGrK6Zn8X-CW17Vjn0W8EJEwEflnX7ucWn2eGBA/viewform?usp=pp_url
submitted by /u/KindlyGuess419
[link] [comments]
( 41
min )
Microsoft hooks ChatGPT up to a robot, NVIDIA promises to improve AI performance 1 million times over the next decade, AWS hugs Hugging Face, ControlNet takes image generation by storm, and more -
https://scottswigart.substack.com/p/whats-new-in-generative-ai-2023-02
submitted by /u/smswigart
[link] [comments]
( 41
min )
submitted by /u/GodGivenRx
[link] [comments]
( 41
min )
submitted by /u/yikeshardware
[link] [comments]
( 42
min )
submitted by /u/trcytony
[link] [comments]
( 41
min )
submitted by /u/citidotio
[link] [comments]
( 41
min )
submitted by /u/AlternativeFee1
[link] [comments]
( 41
min )
Meet the Google for Startups Accelerator Canada class of 2023!
Bidmii is an online marketplace that quickly connects homeowners and contractors for home improvement projects, guaranteeing payment security for each party by holding payments in trust.
Chimoney enables businesses to send payments to phones, emails and Twitter, regardless of scale, currency, country and other factors.
Clavis Studio is an AI and machine learning (ML)-driven design, visualization, and sourcing platform that provides a marketplace for designers and decorators to source new clients and use supporting tools to deliver their projects.
Foqus Technologies is an AI and quantitative imaging technology company that designs and develops software solutions to enhance the speed and quality of MRI scans.
Gryd Digital …
( 43
min )
submitted by /u/HEAL3D
[link] [comments]
( 41
min )
submitted by /u/rtwalz
[link] [comments]
( 42
min )
submitted by /u/Your_bad_sins
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/kg_from_ct
[link] [comments]
( 43
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
We have ancient biology, medieval institutions, and we are approaching godlike technology. There are so many nightmares that could play out and we have to be conscious of them at all times. Setting up AI systems correctly and ensuring that our rulers are responsible is the number one priority. But what happens if we do manage to retain control and agency?
If humanity can pull this off, then perhaps we can begin to imagine the incredible potential that awaits us. We are about to be the human beings that get to live through this incredible and most crucial period. What more incredible and meaningful time could there be, than getting to see and be a part of the potential transformation of our species?
https://youtu.be/TQ36hkxIx74
This video explores the concepts postulated by AI philosophers Nick Bostrom and Ray Kurzweil and entertains a cautious optimism about the future of humanity.
submitted by /u/Allisblissallislife
[link] [comments]
( 44
min )
submitted by /u/much_successes
[link] [comments]
( 41
min )
submitted by /u/Interesting-Tip5586
[link] [comments]
( 41
min )
submitted by /u/bendee983
[link] [comments]
( 41
min )
submitted by /u/MedicMoth
[link] [comments]
( 43
min )
submitted by /u/zalivom1s
[link] [comments]
( 41
min )
Model tuning is the experimental process of finding the optimal parameters and configurations for a machine learning (ML) model that result in the best possible desired outcome with a validation dataset. Single objective optimization with a performance metric is the most common approach for tuning ML models. However, in addition to predictive performance, there may […]
( 12
min )
https://www.legoscript.com/these-companies-are-replacing-workers-with-chatgpt-
submitted by /u/pyactee
[link] [comments]
( 41
min )
As computing and AI advancements spanning decades are enabling incredible opportunities for people and society, they’re also raising questions about responsible development and deployment. For example, the machine learning models powering AI systems may not perform the same for everyone or every condition, potentially leading to harms related to safety, reliability, and fairness. Single metrics […]
The post Responsible AI: The research collaboration behind new open-source tools offered by Microsoft appeared first on Microsoft Research.
( 13
min )
There are a lot of chatbot-based apps that are basically internet text generators with a bit of introductory stage-setting to nudge the interaction into "user talks to helpful chatbot" as opposed to literally any other dialog on the web. Not surprisingly, these are susceptible to a user resetting
( 5
min )
AI Weirdness: the strange side of machine learning
( 2
min )
From scaling mountains in the annual California Death Ride bike challenge to creating a low-cost, open-source ventilator in the early days of the COVID-19 pandemic, NVIDIA Chief Scientist Bill Dally is no stranger to accomplishing near-impossible feats. On Friday, he achieved another rare milestone: induction into the Silicon Valley Engineering Council’s Hall of Fame. The Read article >
( 5
min )
Telcos are seeking industry-standard solutions that can run 5G, AI applications and immersive graphics workloads on the same server — including for computer vision and the metaverse. To meet this need, NVIDIA is developing a new AI-on-5G solution that combines 5G vRAN, edge AI and digital twin workloads on an all-in-one, hyperconverged and GPU-accelerated system. Read article >
( 5
min )
submitted by /u/0ut0flin3
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Philo167
[link] [comments]
( 41
min )
submitted by /u/Interesting-Tip5586
[link] [comments]
( 41
min )
I created two AI ChatGPT Wizards that rap battle based on topics in the twitch chat.
https://www.twitch.tv/fleetyfleet
submitted by /u/fleetisme
[link] [comments]
( 41
min )
submitted by /u/Phishstixxx
[link] [comments]
( 41
min )
submitted by /u/hoky777
[link] [comments]
( 41
min )
submitted by /u/Peaking_AI
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
Artificial intelligence (AI) is one of the most discussed technologies nowadays. It can alter how we live and work, yet there are concerns about its societal impact. In this blog post, we will look at the benefits and drawbacks of artificial intelligence.
https://preview.redd.it/1lix1lb2xjka1.png?width=820&format=png&auto=webp&s=a148bd1ea2376ca824648354d944a79e472bc010
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/Boce77
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
Before
Original Image: https://i.ibb.co/2t1XdZQ/13er.jpg (By Getty Images)
https://preview.redd.it/v9m5ghbnwika1.png?width=1024&format=png&auto=webp&s=10ed97df02d8ba11c83bc332347a253d51a4e6c5
After
Version 1: https://i.ibb.co/ZYqP1LB/1903163b-ed82-4676-b220-84d194557ac3.jpg
https://preview.redd.it/qdj8pd6pwika1.png?width=1126&format=png&auto=webp&s=bb71ef58277518bd8cd2e53f800dece9a28c8330
Version 2: https://i.ibb.co/phqQK2g/ca4b8237-7986-461d-bf4c-3c47427f2be3.png
https://preview.redd.it/jsi8al6wwika1.png?width=1134&format=png&auto=webp&s=c6ef82b46d26a8776e045dc53b7bc0e5b0f0ec7f
My Question
These look good to u guys? Please feel free to give me some feedback. Thanks!
submitted by /u/Jealous_Ad8132
[link] [comments]
( 41
min )
submitted by /u/wyem
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/MysteryInc152
[link] [comments]
( 43
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
submitted by /u/V1bicycle
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 43
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
submitted by /u/cheekysalads123
[link] [comments]
( 41
min )
submitted by /u/regalalgorithm
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Opitmus_Prime
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 41
min )
submitted by /u/Philo167
[link] [comments]
( 41
min )
submitted by /u/shubhamorcapex
[link] [comments]
( 42
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 41
min )
submitted by /u/Quirky_Spirit_1951
[link] [comments]
( 43
min )
submitted by /u/NeonChat
[link] [comments]
( 42
min )
submitted by /u/Illustrious_Row_9971
[link] [comments]
( 43
min )
Hidden Markov Model implementation in R and Python for discrete and continuous observations. I have a tutorial on YouTube to explain about use and modeling of HMM and how to run these two packages.
Code:
https://github.com/manitadayon/CD_HMM (in R)
https://github.com/manitadayon/Auto_HMM (In Python)
Tutorial:
https://www.youtube.com/watch?v=1b-sd7gulFk&ab_channel=AIandMLFundamentals
https://www.youtube.com/watch?v=ieU8JFLRw2k&ab_channel=AIandMLFundamentals
submitted by /u/chess9145
[link] [comments]
( 43
min )
submitted by /u/taken_every_username
[link] [comments]
( 43
min )
submitted by /u/asdfsr125
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
Hi, sorry for the likely to be dumb question.. I'm relatively new to these topics.
I have a file containing rows with variable length and a class (defined by value 0 or 1).
Is it possible (and it makes sense?) to use k-nearest neighbors classifier to classify variable input lenght data? the file is something like this: https://gist.github.com/edoardottt/46dd13c60408e95c1685ee88b5f6ace8
Thanks!
submitted by /u/edoardottt
[link] [comments]
( 45
min )
To design with AI models, user experience (UX) designers must assess the fit
between the model and user needs. Based on user research, they need to
contextualize the model's behavior and potential failures within their
product-specific data instances and user scenarios. However, our formative
interviews with ten UX professionals revealed that such a proactive discovery
of model limitations is challenging and time-intensive. Furthermore, designers
often lack technical knowledge of AI and accessible exploration tools, which
challenges their understanding of model capabilities and limitations. In this
work, we introduced a failure-driven design approach to AI, a workflow that
encourages designers to explore model behavior and failure patterns early in
the design process. The implementation of fAIlureNotes, a designer-centered
failure exploration and analysis tool, supports designers in evaluating models
and identifying failures across diverse user groups and scenarios. Our
evaluation with UX practitioners shows that fAIlureNotes outperforms today's
interactive model cards in assessing context-specific model performance.
( 2
min )
Knowledge tracing (KT) serves as a primary part of intelligent education
systems. Most current KTs either rely on expert judgments or only exploit a
single network structure, which affects the full expression of learning
features. To adequately mine features of students' learning process, Deep
Knowledge Tracing Based on Spatial and Temporal Deep Representation Learning
for Learning Performance Prediction (DKT-STDRL) is proposed in this paper.
DKT-STDRL extracts spatial features from students' learning history sequence,
and then further extracts temporal features to extract deeper hidden
information. Specifically, firstly, the DKT-STDRL model uses CNN to extract the
spatial feature information of students' exercise sequences. Then, the spatial
features are connected with the original students' exercise features as joint
learning features. Then, the joint features are input into the BiLSTM part.
Finally, the BiLSTM part extracts the temporal features from the joint learning
features to obtain the prediction information of whether the students answer
correctly at the next time step. Experiments on the public education datasets
ASSISTment2009, ASSISTment2015, Synthetic-5, ASSISTchall, and Statics2011 prove
that DKT-STDRL can achieve better prediction effects than DKT and CKT.
( 2
min )
Despite their growing popularity, data-driven models of real-world dynamical
systems require lots of data. However, due to sensing limitations as well as
privacy concerns, this data is not always available, especially in domains such
as energy. Pre-trained models using data gathered in similar contexts have
shown enormous potential in addressing these concerns: they can improve
predictive accuracy at a much lower observational data expense. Theoretically,
due to the risk posed by negative transfer, this improvement is however neither
uniform for all agents nor is it guaranteed. In this paper, using data from
several distributed energy resources, we investigate and report preliminary
findings on several key questions in this regard. First, we evaluate the
improvement in predictive accuracy due to pre-trained models, both with and
without fine-tuning. Subsequently, we consider the question of fairness: do
pre-trained models create equal improvements for heterogeneous agents, and how
does this translate to downstream utility? Answering these questions can help
enable improvements in the creation, fine-tuning, and adoption of such
pre-trained models.
( 2
min )
We propose a new supervised learning method for Variational AutoEncoder (VAE)
which has a causally disentangled representation and achieves the causally
disentangled generation (CDG) simultaneously. In this paper, CDG is defined as
a generative model able to decode an output precisely according to the causally
disentangled representation. We found that the supervised regularization of the
encoder is not enough to obtain a generative model with CDG. Consequently, we
explore sufficient and necessary conditions for the decoder and the causal
effect to achieve CDG. Moreover, we propose a generalized metric measuring how
a model is causally disentangled generative. Numerical results with the image
and tabular datasets corroborate our arguments.
( 2
min )
Our goal is to produce methods for observational causal inference that are
auditable, easy to troubleshoot, yield accurate treatment effect estimates, and
scalable to high-dimensional data. We describe an almost-exact matching
approach that achieves these goals by (i) learning a distance metric via
outcome modeling, (ii) creating matched groups using the distance metric, and
(iii) using the matched groups to estimate treatment effects. Our proposed
method uses variable importance measurements to construct a distance metric,
making it a flexible method that can be adapted to various applications.
Concentrating on the scalability of the problem in the number of potential
confounders, we operationalize our approach with LASSO. We derive performance
guarantees for settings where LASSO outcome modeling consistently identifies
all confounders (importantly without requiring the linear model to be correctly
specified). We also provide experimental results demonstrating the auditability
of matches, as well as extensions to more general nonparametric outcome
modeling.
( 2
min )
Deep learning approaches require collection of data on many different input
features or variables for accurate model training and prediction. Since data
collection on input features could be costly, it is crucial to reduce the cost
by selecting a subset of features and developing a budget-constrained model
(BCM). In this paper, we introduce an approach to eliminating less important
features for big data analysis using Deep Neural Networks (DNNs). Once a DNN
model has been developed, we identify the weak links and weak neurons, and
remove some input features to bring the model cost within a given budget. The
experimental results show our approach is feasible and supports user selection
of a suitable BCM within a given budget.
( 2
min )
Deep networks are susceptible to numerous types of adversarial attacks.
Certified defenses provide guarantees on a model's robustness, but most of
these defenses are restricted to a single attack type. In contrast, this paper
proposes feature partition aggregation (FPA) - a certified defense against a
union of attack types, namely evasion, backdoor, and poisoning attacks. We
specifically consider an $\ell_0$ or sparse attacker that arbitrarily controls
an unknown subset of the training and test features - even across all
instances. FPA generates robustness guarantees via an ensemble whose submodels
are trained on disjoint feature sets. Following existing certified sparse
defenses, we generalize FPA's guarantees to top-$k$ predictions. FPA
significantly outperforms state-of-the-art sparse defenses providing larger and
stronger robustness guarantees, while simultaneously being up to
5,000${\times}$ faster.
( 2
min )
Bernstein's condition is a key assumption that guarantees fast rates in
machine learning. For example, the Gibbs algorithm with prior $\pi$ has an
excess risk in $O(d_{\pi}/n)$, as opposed to the standard
$O(\sqrt{d_{\pi}/n})$, where $n$ denotes the number of observations and
$d_{\pi}$ is a complexity parameter which depends on the prior $\pi$. In this
paper, we examine the Gibbs algorithm in the context of meta-learning, i.e.,
when learning the prior $\pi$ from $T$ tasks (with $n$ observations each)
generated by a meta distribution. Our main result is that Bernstein's condition
always holds at the meta level, regardless of its validity at the observation
level. This implies that the additional cost to learn the Gibbs prior $\pi$,
which will reduce the term $d_\pi$ across tasks, is in $O(1/T)$, instead of the
expected $O(1/\sqrt{T})$. We further illustrate how this result improves on
standard rates in three different settings: discrete priors, Gaussian priors
and mixture of Gaussians priors.
( 2
min )
Deep learning is a crucial aspect of machine learning, but it also makes
these techniques vulnerable to adversarial examples, which can be seen in a
variety of applications. These examples can even be targeted at humans, leading
to the creation of false media, such as deepfakes, which are often used to
shape public opinion and damage the reputation of public figures. This article
will explore the concept of adversarial examples, which are comprised of
perturbations added to clean images or videos, and their ability to deceive DL
algorithms. The proposed approach achieved a precision value of accuracy of
76.2% on the DFDC dataset.
( 2
min )
Model parallelism is conventionally viewed as a method to scale a single
large deep learning model beyond the memory limits of a single device. In this
paper, we demonstrate that model parallelism can be additionally used for the
statistical multiplexing of multiple devices when serving multiple models, even
when a single model can fit into a single device. Our work reveals a
fundamental trade-off between the overhead introduced by model parallelism and
the opportunity to exploit statistical multiplexing to reduce serving latency
in the presence of bursty workloads. We explore the new trade-off space and
present a novel serving system, AlpaServe, that determines an efficient
strategy for placing and parallelizing collections of large deep learning
models across a distributed cluster. Evaluation results on production workloads
show that AlpaServe can process requests at up to 10x higher rates or 6x more
burstiness while staying within latency constraints for more than 99% of
requests.
( 2
min )
Explainable Artificial Intelligence (XAI) techniques are frequently required
by users in many AI systems with the goal of understanding complex models,
their associated predictions, and gaining trust. While suitable for some
specific tasks during development, their adoption by organisations to enhance
trust in machine learning systems has unintended consequences. In this paper we
discuss XAI's limitations in deployment and conclude that transparency
alongside with rigorous validation are better suited to gaining trust in AI
systems.
( 2
min )
Despite the popularity of low-rank matrix completion, the majority of its
theory has been developed under the assumption of random observation patterns,
whereas very little is known about the practically relevant case of non-random
patterns. Specifically, a fundamental yet largely open question is to describe
patterns that allow for unique or finitely many completions. This paper
provides two such families of patterns for any rank. A key to achieving this is
a novel formulation of low-rank matrix completion in terms of Plucker
coordinates, the latter a traditional tool in computer vision. This connection
is of potential significance to a wide family of matrix and subspace learning
problems with incomplete data.
( 2
min )
We study the statistical properties of learning to defer (L2D) to multiple
experts. In particular, we address the open problems of deriving a consistent
surrogate loss, confidence calibration, and principled ensembling of experts.
Firstly, we derive two consistent surrogates -- one based on a softmax
parameterization, the other on a one-vs-all (OvA) parameterization -- that are
analogous to the single expert losses proposed by Mozannar and Sontag (2020)
and Verma and Nalisnick (2022), respectively. We then study the frameworks'
ability to estimate P( m_j = y | x ), the probability that the jth expert will
correctly predict the label for x. Theory shows the softmax-based loss causes
mis-calibration to propagate between the estimates while the OvA-based loss
does not (though in practice, we find there are trade offs). Lastly, we propose
a conformal inference technique that chooses a subset of experts to query when
the system defers. We perform empirical validation on tasks for galaxy, skin
lesion, and hate speech classification.
( 2
min )
Randomly pivoted Cholesky (RPCholesky) is a natural algorithm for computing a
rank-k approximation of an N x N positive semidefinite (psd) matrix. RPCholesky
can be implemented with just a few lines of code. It requires only (k+1)N entry
evaluations and O(k^2 N) additional arithmetic operations. This paper offers
the first serious investigation of its experimental and theoretical behavior.
Empirically, RPCholesky matches or improves on the performance of alternative
algorithms for low-rank psd approximation. Furthermore, RPCholesky provably
achieves near-optimal approximation guarantees. The simplicity, effectiveness,
and robustness of this algorithm strongly support its use in scientific
computing and machine learning applications.
( 2
min )
Understanding when and how much a model gradient leaks information about the
training sample is an important question in privacy. In this paper, we present
a surprising result: even without training or memorizing the data, we can fully
reconstruct the training samples from a single gradient query at a randomly
chosen parameter value. We prove the identifiability of the training data under
mild conditions: with shallow or deep neural networks and a wide range of
activation functions. We also present a statistically and computationally
efficient algorithm based on tensor decomposition to reconstruct the training
data. As a provable attack that reveals sensitive training data, our findings
suggest potential severe threats to privacy, especially in federated learning.
( 2
min )
Bayesian Optimization is a useful tool for experiment design. Unfortunately,
the classical, sequential setting of Bayesian Optimization does not translate
well into laboratory experiments, for instance battery design, where
measurements may come from different sources and their evaluations may require
significant waiting times. Multi-fidelity Bayesian Optimization addresses the
setting with measurements from different sources. Asynchronous batch Bayesian
Optimization provides a framework to select new experiments before the results
of the prior experiments are revealed. This paper proposes an algorithm
combining multi-fidelity and asynchronous batch methods. We empirically study
the algorithm behavior, and show it can outperform single-fidelity batch
methods and multi-fidelity sequential methods. As an application, we consider
designing electrode materials for optimal performance in pouch cells using
experiments with coin cells to approximate battery performance.
( 2
min )
Bernstein's condition is a key assumption that guarantees fast rates in
machine learning. For example, the Gibbs algorithm with prior $\pi$ has an
excess risk in $O(d_{\pi}/n)$, as opposed to the standard
$O(\sqrt{d_{\pi}/n})$, where $n$ denotes the number of observations and
$d_{\pi}$ is a complexity parameter which depends on the prior $\pi$. In this
paper, we examine the Gibbs algorithm in the context of meta-learning, i.e.,
when learning the prior $\pi$ from $T$ tasks (with $n$ observations each)
generated by a meta distribution. Our main result is that Bernstein's condition
always holds at the meta level, regardless of its validity at the observation
level. This implies that the additional cost to learn the Gibbs prior $\pi$,
which will reduce the term $d_\pi$ across tasks, is in $O(1/T)$, instead of the
expected $O(1/\sqrt{T})$. We further illustrate how this result improves on
standard rates in three different settings: discrete priors, Gaussian priors
and mixture of Gaussians priors.
( 2
min )
We propose a new supervised learning method for Variational AutoEncoder (VAE)
which has a causally disentangled representation and achieves the causally
disentangled generation (CDG) simultaneously. In this paper, CDG is defined as
a generative model able to decode an output precisely according to the causally
disentangled representation. We found that the supervised regularization of the
encoder is not enough to obtain a generative model with CDG. Consequently, we
explore sufficient and necessary conditions for the decoder and the causal
effect to achieve CDG. Moreover, we propose a generalized metric measuring how
a model is causally disentangled generative. Numerical results with the image
and tabular datasets corroborate our arguments.
( 2
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 41
min )
submitted by /u/mothybot
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
submitted by /u/Steve____Stifler
[link] [comments]
( 41
min )
submitted by /u/Otarih
[link] [comments]
( 41
min )
submitted by /u/Mk_Makanaki
[link] [comments]
( 41
min )
Happy Friday! Register now for a webinar we have coming up next Tuesday at 12PM ET: Architectures for Running ML at the Edge, presented by ODSC! Registration is free, sign up here.
In this webinar, we will explore different paradigms for edge deployment of ML models, including federated learning, cloud-edge hybrid architectures, and standalone edge models. We will discuss the trade-offs and considerations for each, as well as best practices for designing and deploying ML models at the edge.
Tune in Tuesday Feb. 28 @ 12PM ET. Register here.
submitted by /u/modzykirsten
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 41
min )
submitted by /u/Linkology
[link] [comments]
( 42
min )
Hi guys,
I have made a video on YouTube here where I explain what gradient boosting is and how it works.
I hope it may be of use to some of you out there. As always, feedback is more than welcomed! :)
submitted by /u/Personal-Trainer-541
[link] [comments]
( 41
min )
submitted by /u/Lumpek
[link] [comments]
( 41
min )
AI has the potential to revolutionize fraud detection by financial institutions, providing faster and more accurate detection of fraudulent activities. Here we present some ways in which AI can be used to detect and prevent fraud. https://youtu.be/luX9ecRwn_c
submitted by /u/eprepsg
[link] [comments]
( 41
min )
submitted by /u/awalias
[link] [comments]
( 41
min )
submitted by /u/ytcoinartist
[link] [comments]
( 41
min )
https://twitter.com/GuillaumeLample/status/1629151231800115202?t=4cLD6Ko2Ld9Y3EIU72-M2g&s=19
Paper here - https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/
submitted by /u/MysteryInc152
[link] [comments]
( 48
min )
Excited to share "Minds". A a new way to build backends and workflows entirely with AI (LLMs from OpenAI and Cohere). The AI can call your APIs, lookup in your database, etc.
With just a couple lines of code you can builds things like a question answering service where the AI can query your local database to help answer customer support queries etc.
https://github.com/dosco/minds
submitted by /u/gsvclass
[link] [comments]
( 43
min )
A recent podcast interview of EY's has gone a bit viral, and in it he claims that researchers have dismissed his views without seriously engaging with his arguments, which are described here in relative detail.
I'm aware of on-going AI safety and interpretability research, but the dual use of the term "AI safety" to mean something close to AI ethics, and something close to preventing an existential threat to humanity, makes distinguishing the goals of, say, Anthropic, and the extent to which they consider the latter a serious concern, difficult as a layperson.
I haven't personally found EY's arguments to be particularly rigorous, but I'm not the best suited person to evaluate their validity. Any thoughts are appreciated. Thanks in advance!
submitted by /u/SchmidhuberDidIt
[link] [comments]
( 44
min )
In this blog post we are discussing how to accelerate disaster response efforts using computer vision techniques for processing satellite imagery using AWS services.
( 8
min )
Amazon SageMaker multi-model endpoints (MMEs) provide a scalable and cost-effective way to deploy a large number of machine learning (ML) models. It gives you the ability to deploy multiple ML models in a single serving container behind a single endpoint. From there, SageMaker manages loading and unloading the models and scaling resources on your behalf […]
( 14
min )
Cloudy British weather is the butt of many jokes — but the United Kingdom’s national power grid is making the most of its sunshine. With the help of Open Climate Fix, a nonprofit product lab, the control room of the National Grid Electricity System Operator (ESO) is testing AI models that provide granular, near-term forecasts Read article >
( 6
min )
I am looking at OpenAI's implementation of SAC over here. Also, here is their code to compute the action and its log prob -
class SquashedGaussianMLPActor(nn.Module): def __init__(self, obs_dim, act_dim, hidden_sizes, activation, act_limit): super().__init__() self.net = mlp([obs_dim] + list(hidden_sizes), activation, activation) self.mu_layer = nn.Linear(hidden_sizes[-1], act_dim) self.log_std_layer = nn.Linear(hidden_sizes[-1], act_dim) self.act_limit = act_limit def forward(self, obs, deterministic=False, with_logprob=True): net_out = self.net(obs) mu = self.mu_layer(net_out) log_std = self.log_std_layer(net_out) log_std = torch.clamp(log_std, LOG_STD_MIN, LOG_STD_MAX) std = torch.exp(log_std) # Pre-squash distribution and sample pi_distribution = Normal(mu, std) if deterministic: # O…
( 45
min )
submitted by /u/Number_5_alive
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/kumarsaksham1891
[link] [comments]
( 41
min )
submitted by /u/CHRILLCAST
[link] [comments]
( 41
min )
submitted by /u/Phishstixxx
[link] [comments]
( 41
min )
submitted by /u/thedragod
[link] [comments]
( 41
min )
submitted by /u/tlokjock
[link] [comments]
( 41
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 41
min )
submitted by /u/sopmac21379
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 43
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 43
min )
submitted by /u/ytcoinartist
[link] [comments]
( 41
min )
"Hotter take: ML would have advanced faster if another front-end language had been available and widely adopted instead of Python. One that is interactive yet fast & compilable, multithreaded (no GIL), isn't bloated, doesn't care about white spaces,... E.g. Julia or some Lisp."
Link from the original tweet
submitted by /u/Marcapiel
[link] [comments]
( 60
min )
Over the last 10 years, a number of players have developed autonomous vehicle (AV) systems using deep neural networks (DNNs). These systems have evolved from simple rule-based systems to Advanced Driver Assistance Systems (ADAS) and fully autonomous vehicles. These systems require petabytes of data and thousands of compute units (vCPUs and GPUs) to train. This […]
( 11
min )
submitted by /u/yachay_ai
[link] [comments]
( 41
min )
https://www.legoscript.com/we-will-die-if-not-careful
submitted by /u/pyactee
[link] [comments]
( 44
min )
submitted by /u/gwern
[link] [comments]
( 42
min )
submitted by /u/thejashGI
[link] [comments]
( 40
min )
Discover the top 5 uses of UI/UX design in 2023. Engage your users, increase conversion rates, and boost ROI with better user experiences.
The post Maximizing Business Success with UI/UX Design: The Top 5 Advantages appeared first on Data Science Central.
( 20
min )
The do-it-yourself climate modeling movement is here. Researchers from Northwestern University and Argonne National Laboratory have been launching NVIDIA Jetson-driven edge computing Waggle devices across the globe to collect hyper-local climate information. Waggle is an open source sensor platform for edge computing developed by Argonne. Working with this, scientists share open-source AI code designed for Read article >
( 6
min )
A million developers across the globe are now using the NVIDIA Jetson platform for edge AI and robotics to build innovative technologies. Plus, more than 6,000 companies — a third of which are startups — have integrated the platform with their products. These milestones and more will be celebrated during the NVIDIA Jetson Edge AI Read article >
( 6
min )
To drive the automotive industry forward, NVIDIA and Mercedes-Benz are taking the virtual road. NVIDIA founder and CEO Jensen Huang joined Mercedes-Benz CEO Ola Källenius on stage at the automaker’s strategy update event yesterday in Silicon Valley, showcasing progress in their landmark partnership to digitalize the entire product lifecycle, plus the ownership and automated driving Read article >
( 6
min )
The cloud just got bigger. NVIDIA and Microsoft announced this week they’re working to bring top PC Xbox Game Studios games to the GeForce NOW library, including titles from Bethesda, Mojang Studios and Activision, pending closure of Microsoft’s acquisition. With six new games joining the cloud this week for members to stream, it’s a jam-packed Read article >
( 5
min )
submitted by /u/GodGivenRx
[link] [comments]
( 40
min )
submitted by /u/timothy-ventura
[link] [comments]
( 41
min )
submitted by /u/Moneyguy2323
[link] [comments]
( 47
min )
submitted by /u/theindianappguy
[link] [comments]
( 41
min )
submitted by /u/dcastm
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/Ziinxx
[link] [comments]
( 40
min )
submitted by /u/qptbook
[link] [comments]
( 40
min )
submitted by /u/TatianaW
[link] [comments]
( 41
min )
This post is co-written with Swagata Ashwani, Senior Data Scientist at Boomi. Boomi is an enterprise-level software as a service (SaaS) independent software vendor (ISV) that creates developer enablement tooling for software engineers. These tools integrate via API into Boomi’s core service offering. In this post, we discuss how Boomi used the bring-your-own-container (BYOC) approach […]
( 8
min )
"Deep learning is the only thing that currently works at scale it's the only class of algorithms that is able to discover arbitrary functions in a reasonable amount of time."
https://www.youtube.com/watch?v=p-OYPRhqRCg
I know of the universal approximation theorem. But is there any mathematical formulation of this statement?
submitted by /u/GraciousReformer
[link] [comments]
( 50
min )
submitted by /u/Ziinxx
[link] [comments]
( 40
min )
submitted by /u/auto_mata
[link] [comments]
( 41
min )
Laptops equipped with NVIDIA GeForce RTX 4070, 4060 and 4050 GPUs are now available. The new lineup — including NVIDIA Studio-validated laptops from ASUS, GIGABYTE and Samsung — gives creators more options to create from anywhere with lighter, thinner devices that dramatically exceed the performance of the last generation.
( 8
min )
submitted by /u/jamesj
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 41
min )
Similar to product explainer video like here: https://www.youtube.com/playlist?list=PL2P1Z-F3mmqxsMlpCp6wpeqAqlusiuZ_h
I've tried different services, but either the result was not good enough (e.g. Steve.ai has a "script to animation", but the result was very limited) or the service was not covering the case of script to video (e.g. https://www.synthesia.io/)
submitted by /u/muran123456
[link] [comments]
( 41
min )
submitted by /u/jaxsondeville
[link] [comments]
( 46
min )
submitted by /u/Machine_Minds
[link] [comments]
( 41
min )
submitted by /u/VausProd
[link] [comments]
( 41
min )
I have a lot of photos in my portfolio website and usually post them on social media by series like this example but want to find some new and creative ways to combine/curate photos in a different way which is visually appealing. And to come up with some new ideas outside of my own head I thought, maybe there is a tool that can help with ideas.
submitted by /u/Northlandscapes
[link] [comments]
( 41
min )
submitted by /u/TheRPGGamerMan
[link] [comments]
( 41
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 41
min )
submitted by /u/Reinfeldx
[link] [comments]
( 41
min )
submitted by /u/BeefarmRich
[link] [comments]
( 41
min )
submitted by /u/TallSide7746
[link] [comments]
( 43
min )
After you build, train, and evaluate your machine learning (ML) model to ensure it’s solving the intended business problem proposed, you want to deploy that model to enable decision-making in business operations. Models that support business-critical functions are deployed to a production environment where a model release strategy is put in place. Given the nature […]
( 15
min )
We’re thrilled to announce an expanded collaboration between AWS and Hugging Face to accelerate the training, fine-tuning, and deployment of large language and vision models used to create generative AI applications. Generative AI applications can perform a variety of tasks, including text summarization, answering questions, code generation, image creation, and writing essays and articles. AWS […]
( 4
min )
Announcements Data passivity and the current obsession with off-the-shelf chatbots Last September, Bill Schmarzo (“Point – Counterpoint on Why Organizations Suck at AI”) listed a few common excuses enterprises use to explain why they aren’t doing more with AI: We Don’t Have the Right Talent. “We can’t hire the right talent and don’t have bottomless budgets… Read More »DSC Weekly 21 February 2023 – Data Passivity and the Current Obsession with Off-The-Shelf Chatbots
The post DSC Weekly 21 February 2023 – Data Passivity and the Current Obsession with Off-The-Shelf Chatbots appeared first on Data Science Central.
( 20
min )
With every passing year, data analytics services are gaining more prominence as most enterprises are realizing the potential of data in driving important business decisions. The growing availability of data, developments in technology, and mounting demand for data-driven insights will contribute to this trend. Additionally, the upsurge of big data and cloud computing will make it easier… Read More »The Impact of AI-enabled Data Analytics Services Across Major Industries
The post The Impact of AI-enabled Data Analytics Services Across Major Industries appeared first on Data Science Central.
( 22
min )
Cybercriminals still attack startup businesses even though they may have smaller databases and less information to steal compared to the big players in the market. Why? Bad actors take the path of least resistance, and startups tend to be less equipped to defend against cyber attacks, spending an average of $500 or less on cybersecurity.… Read More »How to Build a Robust Cybersecurity Strategy for Your Startup
The post How to Build a Robust Cybersecurity Strategy for Your Startup appeared first on Data Science Central.
( 24
min )
The telecommunications industry has for decades helped advance revolutionary change – enabling everything from telephones and television to online streaming and self-driving cars. Yet the industry has long been considered an evolutionary mover in its own business. A recent survey of more than 400 telecommunications industry professionals from around the world found that same cautious Read article >
( 6
min )
Structural information of phylogenetic tree topologies plays an important
role in phylogenetic inference. However, finding appropriate topological
structures for specific phylogenetic inference tasks often requires significant
design effort and domain expertise. In this paper, we propose a novel
structural representation method for phylogenetic inference based on learnable
topological features. By combining the raw node features that minimize the
Dirichlet energy with modern graph representation learning techniques, our
learnable topological features can provide efficient structural information of
phylogenetic trees that automatically adapts to different downstream tasks
without requiring domain expertise. We demonstrate the effectiveness and
efficiency of our method on a simulated data tree probability estimation task
and a benchmark of challenging real data variational Bayesian phylogenetic
inference problems.
( 2
min )
We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular
adaptive (self-tuning) method for first-order stochastic optimization. Despite
being well studied, existing analyses of this method suffer from various
shortcomings: they either assume some knowledge of the problem parameters,
impose strong global Lipschitz conditions, or fail to give bounds that hold
with high probability. We provide a comprehensive analysis of this basic method
without any of these limitations, in both the convex and non-convex (smooth)
cases, that additionally supports a general ``affine variance'' noise model and
provides sharp rates of convergence in both the low-noise and
high-noise~regimes.
( 2
min )
In this paper, we investigate the impact of stochasticity and large stepsizes
on the implicit regularisation of gradient descent (GD) and stochastic gradient
descent (SGD) over diagonal linear networks. We prove the convergence of GD and
SGD with macroscopic stepsizes in an overparametrised regression setting and
characterise their solutions through an implicit regularisation problem. Our
crisp characterisation leads to qualitative insights about the impact of
stochasticity and stepsizes on the recovered solution. Specifically, we show
that large stepsizes consistently benefit SGD for sparse regression problems,
while they can hinder the recovery of sparse solutions for GD. These effects
are magnified for stepsizes in a tight window just below the divergence
threshold, in the ``edge of stability'' regime. Our findings are supported by
experimental results.
( 2
min )
We develop inductive biases for the machine learning of complex physical
systems based on the port-Hamiltonian formalism. To satisfy by construction the
principles of thermodynamics in the learned physics (conservation of energy,
non-negative entropy production), we modify accordingly the port-Hamiltonian
formalism so as to achieve a port-metriplectic one. We show that the
constructed networks are able to learn the physics of complex systems by parts,
thus alleviating the burden associated to the experimental characterization and
posterior learning process of this kind of systems. Predictions can be done,
however, at the scale of the complete system. Examples are shown on the
performance of the proposed technique.
( 2
min )
Federated learning (FL) is a privacy-preserving learning technique that
enables distributed computing devices to train shared learning models across
data silos collaboratively. Existing FL works mostly focus on designing
advanced FL algorithms to improve the model performance. However, the economic
considerations of the clients, such as fairness and incentive, are yet to be
fully explored. Without such considerations, self-motivated clients may lose
interest and leave the federation. To address this problem, we designed a novel
incentive mechanism that involves a client selection process to remove
low-quality clients and a money transfer process to ensure a fair reward
distribution. Our experimental results strongly demonstrate that the proposed
incentive mechanism can effectively improve the duration and fairness of the
federation.
( 2
min )
submitted by /u/TimeNeighborhood3869
[link] [comments]
( 40
min )
submitted by /u/thedragod
[link] [comments]
( 40
min )
submitted by /u/magenta_placenta
[link] [comments]
( 41
min )
submitted by /u/SanatanCharacters
[link] [comments]
( 40
min )
submitted by /u/freshthreadshop
[link] [comments]
( 40
min )
submitted by /u/Chisom1998_
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 41
min )
submitted by /u/Economy_Vacation_761
[link] [comments]
( 40
min )
submitted by /u/theindianappguy
[link] [comments]
( 41
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 40
min )
submitted by /u/cbsudux
[link] [comments]
( 40
min )
submitted by /u/SAT0725
[link] [comments]
( 40
min )
submitted by /u/supergroch
[link] [comments]
( 45
min )
submitted by /u/aizaz-zazii
[link] [comments]
( 41
min )
Hello everyone,
It's that time again, thank you all so much for the support you've given us over here. I've done a ton of typing this morning, so for a summary of what I've updated, you can see the higher-level twitter thread I wrote at https://twitter.com/hi_tysam/status/1627679672988319746?cxt=HHwWhIC-yb2C15YtAAAA, or the more detailed (but still rough cut) patch notes I wrote this morning at https://github.com/tysam-code/hlb-CIFAR10/releases/tag/v0.5.0
Happy to answer any questions anyone might have, cheers! :D :))))
submitted by /u/tysam_and_co
[link] [comments]
( 43
min )
submitted by /u/OK-I-will-try
[link] [comments]
( 41
min )
In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart. Stable Diffusion is a deep learning model that allows you to generate realistic, high-quality images and stunning art in just a few seconds. Although creating impressive images can find use in industries ranging from […]
( 18
min )
submitted by /u/Exciting-Company-75
[link] [comments]
( 41
min )
submitted by /u/GodGivenRx
[link] [comments]
( 40
min )
submitted by /u/Imagine-your-success
[link] [comments]
( 40
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 40
min )
submitted by /u/EIDANart
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/SupPandaHugger
[link] [comments]
( 42
min )
https://avaturn.me/
submitted by /u/theaiguru
[link] [comments]
( 40
min )
submitted by /u/Knight_Fisher61
[link] [comments]
( 40
min )
submitted by /u/walt74
[link] [comments]
( 47
min )
submitted by /u/motivationinsta
[link] [comments]
( 40
min )
submitted by /u/slavaMZ
[link] [comments]
( 41
min )
submitted by /u/cobalt1137
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
I have written a blog post explaining the Barlow Twins paper from Meta AI. Can you guys have a read and provide suggestions to improve it further? Thanks in advance!
https://pmgautam.com/posts/barlow-twins-explanation.html
submitted by /u/pmgautam_
[link] [comments]
( 42
min )
submitted by /u/MysteryInc152
[link] [comments]
( 42
min )
I am following this implementation of ddpg and found this code -
self.linear3.weight.data.uniform_(-init_w, init_w)
]It seems like the author is forcing the weights of the final layer to follow a uniform distribution.
Why is the author only replacing the final layer weights?
How does uniform weight initialization help?
I have heard a lot about the usefulness of Orthogonal initialization. This is the first time, I have seen the above type of initialization.
submitted by /u/Academic-Rent7800
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 43
min )
submitted by /u/hayAbhay
[link] [comments]
( 44
min )
submitted by /u/Lumpek
[link] [comments]
( 40
min )
submitted by /u/Lukmin1999
[link] [comments]
( 42
min )
submitted by /u/timothy-ventura
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 40
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 40
min )
submitted by /u/OnlyProggingForFun
[link] [comments]
( 41
min )
submitted by /u/jrstelle
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/Philo167
[link] [comments]
( 40
min )
submitted by /u/ssigea
[link] [comments]
( 42
min )
submitted by /u/Mk_Makanaki
[link] [comments]
( 42
min )
submitted by /u/malirkan
[link] [comments]
( 41
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 40
min )
submitted by /u/Reddit_Anon22
[link] [comments]
( 43
min )
This note describes a new approach to classifying graphs that leverages graph
generative models (GGM). Assuming a GGM that defines a joint probability
distribution over graphs and their class labels, I derive classification
formulas for the probability of a class label given a graph. A new conditional
ELBO can be used to train a generative graph auto-encoder model for
discrimination. While leveraging generative models for classification has been
well explored for non-relational i.i.d. data, to our knowledge it is a novel
approach to graph classification.
( 2
min )
This work explains in detail the theory behind Complex-Valued Neural Network
(CVNN), including Wirtinger calculus, complex backpropagation, and basic
modules such as complex layers, complex activation functions, or complex weight
initialization. We also show the impact of not adapting the weight
initialization correctly to the complex domain. This work presents a strong
focus on the implementation of such modules on Python using cvnn toolbox. We
also perform simulations on real-valued data, casting to the complex domain by
means of the Hilbert Transform, and verifying the potential interest of CVNN
even for non-complex data.
( 2
min )
Lung cancer is the leading cause of death among different types of cancers.
Every year, the lives lost due to lung cancer exceed those lost to pancreatic,
breast, and prostate cancer combined. The survival rate for lung cancer
patients is very low compared to other cancer patients due to late diagnostics.
Thus, early lung cancer diagnostics is crucial for patients to receive early
treatments, increasing the survival rate or even becoming cancer-free. This
paper proposed a deep-learning model for early lung cancer prediction and
diagnosis from Computed Tomography (CT) scans. The proposed mode achieves high
accuracy. In addition, it can be a beneficial tool to support radiologists'
decisions in predicting and detecting lung cancer and its stage.
( 2
min )
Graph neural networks (GNNs) are able to leverage the structure of graph data
by passing messages along the edges of the graph. While this allows GNNs to
learn features depending on the graph structure, for certain graph topologies
it leads to inefficient information propagation and a problem known as
oversquashing. This has recently been linked with the curvature and spectral
gap of the graph. On the other hand, adding edges to the message-passing graph
can lead to increasingly similar node representations and a problem known as
oversmoothing. We propose a computationally efficient algorithm that prevents
oversquashing by systematically adding edges to the graph based on spectral
expansion. We combine this with a relational architecture, which lets the GNN
preserve the original graph structure and provably prevents oversmoothing. We
find experimentally that our algorithm outperforms existing graph rewiring
methods in several graph classification tasks.
( 2
min )
In this work, we propose a zero-shot voice conversion method using speech
representations trained with self-supervised learning. First, we develop a
multi-task model to decompose a speech utterance into features such as
linguistic content, speaker characteristics, and speaking style. To disentangle
content and speaker representations, we propose a training strategy based on
Siamese networks that encourages similarity between the content representations
of the original and pitch-shifted audio. Next, we develop a synthesis model
with pitch and duration predictors that can effectively reconstruct the speech
signal from its decomposed representation. Our framework allows controllable
and speaker-adaptive synthesis to perform zero-shot any-to-any voice conversion
achieving state-of-the-art results on metrics evaluating speaker similarity,
intelligibility, and naturalness. Using just 10 seconds of data for a target
speaker, our framework can perform voice swapping and achieves a speaker
verification EER of 5.5% for seen speakers and 8.4% for unseen speakers.
( 2
min )
The increasing application of Artificial Intelligence and Machine Learning
models poses potential risks of unfair behavior and, in light of recent
regulations, has attracted the attention of the research community. Several
researchers focused on seeking new fairness definitions or developing
approaches to identify biased predictions. However, none try to exploit the
counterfactual space to this aim. In that direction, the methodology proposed
in this work aims to unveil unfair model behaviors using counterfactual
reasoning in the case of fairness under unawareness setting. A counterfactual
version of equal opportunity named counterfactual fair opportunity is defined
and two novel metrics that analyze the sensitive information of counterfactual
samples are introduced. Experimental results on three different datasets show
the efficacy of our methodologies and our metrics, disclosing the unfair
behavior of classic machine learning and debiasing models.
( 2
min )
Spherical harmonics provide a smooth, orthogonal, and symmetry-adapted basis
to expand functions on a sphere, and they are used routinely in computer
graphics, signal processing and different fields of science, from geology to
quantum chemistry. More recently, spherical harmonics have become a key
component of rotationally equivariant models for geometric deep learning, where
they are used in combination with distance-dependent functions to describe the
distribution of neighbors within local spherical environments within a point
cloud. We present a fast and elegant algorithm for the evaluation of the
real-valued spherical harmonics. Our construction integrates many of the
desirable features of existing schemes and allows to compute Cartesian
derivatives in a numerically stable and computationally efficient manner. We
provide an efficient C implementation of the proposed algorithm, along with
easy-to-use Python bindings.
( 2
min )
We present Trieste, an open-source Python package for Bayesian optimization
and active learning benefiting from the scalability and efficiency of
TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based
models within sequential decision-making loops, e.g. Gaussian processes from
GPflow or GPflux, or neural networks from Keras. This modular mindset is
central to the package and extends to our acquisition functions and the
internal dynamics of the decision-making loop, both of which can be tailored
and extended by researchers or engineers when tackling custom use cases.
Trieste is a research-friendly and production-ready toolkit backed by a
comprehensive test suite, extensive documentation, and available at
https://github.com/secondmind-labs/trieste.
( 2
min )
In this work we developed a deep learning technique that successfully solves
a non-linear dynamic control problem. Instead of directly tackling the control
problem, we combined methods in probabilistic neural networks and a
Kalman-Filter-inspired model to build a non-linear state estimator for the
system. We then used the estimated states to implement a trivial controller for
the now fully observable system. We applied this technique to a crucial
non-linear control problem that arises in the operation of the LIGO system, an
interferometric gravitational-wave observatory. We demonstrated in simulation
that our approach can learn from data to estimate the state of the system,
allowing a successful control of the interferometer's mirror . We also
developed a computationally efficient model that can run in real time at high
sampling rate on a single modern CPU core, one of the key requirements for the
implementation of our solution in the LIGO digital control system. We believe
these techniques could be used to help tackle similar non-linear control
problems in other applications.
( 2
min )
Robotics, automation, and related Artificial Intelligence (AI) systems have
become pervasive bringing in concerns related to security, safety, accuracy,
and trust. With growing dependency on physical robots that work in close
proximity to humans, the security of these systems is becoming increasingly
important to prevent cyber-attacks that could lead to privacy invasion,
critical operations sabotage, and bodily harm. The current shortfall of
professionals who can defend such systems demands development and integration
of such a curriculum. This course description includes details about seven
self-contained and adaptive modules on "AI security threats against pervasive
robotic systems". Topics include: 1) Introduction, examples of attacks, and
motivation; 2) - Robotic AI attack surfaces and penetration testing; 3) -
Attack patterns and security strategies for input sensors; 4) - Training
attacks and associated security strategies; 5) - Inference attacks and
associated security strategies; 6) - Actuator attacks and associated security
strategies; and 7) - Ethics of AI, robotics, and cybersecurity.
( 2
min )
Decentralised Machine Learning (DML) enables collaborative machine learning
without centralised input data. Federated Learning (FL) and Edge Inference are
examples of DML. While tools for DML (especially FL) are starting to flourish,
many are not flexible and portable enough to experiment with novel systems
(e.g., RISC-V), non-fully connected topologies, and asynchronous collaboration
schemes. We overcome these limitations via a domain-specific language allowing
to map DML schemes to an underlying middleware, i.e. the \ff parallel
programming library. We experiment with it by generating different working DML
schemes on two emerging architectures (ARM-v8, RISC-V) and the x86-64 platform.
We characterise the performance and energy efficiency of the presented schemes
and systems. As a byproduct, we introduce a RISC-V porting of the PyTorch
framework, the first publicly available to our knowledge.
( 2
min )
This paper considers the use of recently proposed optimal transport-based
multivariate test statistics, namely rank energy and its variant the soft rank
energy derived from entropically regularized optimal transport, for the
unsupervised nonparametric change point detection (CPD) problem. We show that
the soft rank energy enjoys both fast rates of statistical convergence and
robust continuity properties which lead to strong performance on real datasets.
Our theoretical analyses remove the need for resampling and out-of-sample
extensions previously required to obtain such rates. In contrast the rank
energy suffers from the curse of dimensionality in statistical estimation and
moreover can signal a change point from arbitrarily small perturbations, which
leads to a high rate of false alarms in CPD. Additionally, under mild
regularity conditions, we quantify the discrepancy between soft rank energy and
rank energy in terms of the regularization parameter. Finally, we show our
approach performs favorably in numerical experiments compared to several other
optimal transport-based methods as well as maximum mean discrepancy.
( 2
min )
We consider the problem of testing the identity of a reversible Markov chain
against a reference from a single trajectory of observations. Employing the
recently introduced notion of a lumping-congruent Markov embedding, we show
that, at least in a mildly restricted setting, testing identity to a reversible
chain reduces to testing to a symmetric chain over a larger state space and
recover state-of-the-art sample complexity for the problem.
( 2
min )
This work explains in detail the theory behind Complex-Valued Neural Network
(CVNN), including Wirtinger calculus, complex backpropagation, and basic
modules such as complex layers, complex activation functions, or complex weight
initialization. We also show the impact of not adapting the weight
initialization correctly to the complex domain. This work presents a strong
focus on the implementation of such modules on Python using cvnn toolbox. We
also perform simulations on real-valued data, casting to the complex domain by
means of the Hilbert Transform, and verifying the potential interest of CVNN
even for non-complex data.
( 2
min )
Many novel notions of "risk" (e.g., CVaR, tilted risk, DRO risk) have been
proposed and studied, but these risks are all at least as sensitive as the mean
to loss tails on the upside, and tend to ignore deviations on the downside. We
study a complementary new risk class that penalizes loss deviations in a
bi-directional manner, while having more flexibility in terms of tail
sensitivity than is offered by mean-variance. This class lets us derive
high-probability learning guarantees without explicit gradient clipping, and
empirical tests using both simulated and real data illustrate a high degree of
control over key properties of the test loss distribution incurred by
gradient-based learners.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
This paper considers the use of recently proposed optimal transport-based
multivariate test statistics, namely rank energy and its variant the soft rank
energy derived from entropically regularized optimal transport, for the
unsupervised nonparametric change point detection (CPD) problem. We show that
the soft rank energy enjoys both fast rates of statistical convergence and
robust continuity properties which lead to strong performance on real datasets.
Our theoretical analyses remove the need for resampling and out-of-sample
extensions previously required to obtain such rates. In contrast the rank
energy suffers from the curse of dimensionality in statistical estimation and
moreover can signal a change point from arbitrarily small perturbations, which
leads to a high rate of false alarms in CPD. Additionally, under mild
regularity conditions, we quantify the discrepancy between soft rank energy and
rank energy in terms of the regularization parameter. Finally, we show our
approach performs favorably in numerical experiments compared to several other
optimal transport-based methods as well as maximum mean discrepancy.
( 2
min )
I have constructed a novel ML (NLP) dataset for classification and labeled it with three classes. The dataset is rather small with about 700 examples, out of which the classes have about 400, 200, and 100 examples respectively. I would like to publish it and describe it in an official publication for a workshop or a conference.
When looking at related datasets and publication, I see that it is common for authors to publish the dataset already split into three chunks - train, dev, test dataset (see the images). It is also common in these papers to provide the performance of baseline models on the dataset. Considering the dataset's small size, I feel like doing a 5-fold cross-validation would be a good alternative for such a small dataset, rather than doing something like a split into 450-1…
( 46
min )
submitted by /u/AlternativeFee1
[link] [comments]
( 41
min )
submitted by /u/kiabarocha
[link] [comments]
( 40
min )
submitted by /u/DunMiff--Sys
[link] [comments]
( 41
min )
Do you think AI will be able to give trustable advice in the future?
Doing research for a school project.If you have the time I would appreciate it if you could fill this form out.
https://forms.gle/X7Fg8cQsqWb278bm7
View Poll
submitted by /u/Jakets_V
[link] [comments]
( 41
min )
submitted by /u/PuppetHere
[link] [comments]
( 40
min )
The more significant ChatGPT usage is becoming, the more concerns the tool is raising.
What do you think: is it an incredible source of inspiration or the death of art as we know it?
Would you be able to distinguish between AI-generated text and human poetry?
Take part in the experiment and share your thoughts here: ChatGPT Survey.
submitted by /u/Lonely-Wish-6377
[link] [comments]
( 41
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 40
min )
submitted by /u/LightOfAntara
[link] [comments]
( 40
min )
submitted by /u/Tao_Dragon
[link] [comments]
( 40
min )
submitted by /u/BackgroundResult
[link] [comments]
( 41
min )
submitted by /u/DunMiff--Sys
[link] [comments]
( 41
min )
submitted by /u/Kiizmod0
[link] [comments]
( 43
min )
North American reinforcement materials market is anticipated to display revenue growth at a CAGR of 5.64% by 2028. Get free sample report
North America Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
Middle East and Africa reinforcement materials market is probable to grow as per projected to witness growth at a CAGR of 5.13% by 2028. Get free sample report
Middle East and Africa Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
Europe’s reinforcement materials market is likely to register growth at a CAGR of 5.87% based on revenue during the period 2021-2028. Get free sample report
Europe Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
Asia-Pacific reinforcement materials market is assessed to display growth at 6.33% of CAGR in the forecasting years 2021-2028. Get free sample report
Asia-Pacific Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
The Global Reinforcement Materials Market is estimated to grow at a CAGR of 6.02%, and is likely to garner $12826 million by 2028. Get a Free Sample Report
Reinforcement Materials Market
submitted by /u/shreyaslakhare11
[link] [comments]
( 41
min )
We consider the optimal sample complexity theory of tabular reinforcement
learning (RL) for controlling the infinite horizon discounted reward in a
Markov decision process (MDP). Optimal min-max complexity results have been
developed for tabular RL in this setting, leading to a sample complexity
dependence on $\gamma$ and $\epsilon$ of the form $\tilde
\Theta((1-\gamma)^{-3}\epsilon^{-2})$, where $\gamma$ is the discount factor
and $\epsilon$ is the tolerance solution error. However, in many applications
of interest, the optimal policy (or all policies) will induce mixing. We show
that in these settings the optimal min-max complexity is $\tilde
\Theta(t_{\text{minorize}}(1-\gamma)^{-2}\epsilon^{-2})$, where
$t_{\text{minorize}}$ is a measure of mixing that is within an equivalent
factor of the total variation mixing time. Our analysis is based on
regeneration-type ideas, that, we believe are of independent interest since
they can be used to study related problems for general state space MDPs.
( 2
min )
Variational inequalities are a broad and flexible class of problems that
includes minimization, saddle point, fixed point problems as special cases.
Therefore, variational inequalities are used in a variety of applications
ranging from equilibrium search to adversarial learning. Today's realities with
the increasing size of data and models demand parallel and distributed
computing for real-world machine learning problems, most of which can be
represented as variational inequalities. Meanwhile, most distributed approaches
has a significant bottleneck - the cost of communications. The three main
techniques to reduce both the total number of communication rounds and the cost
of one such round are the use of similarity of local functions, compression of
transmitted information and local updates. In this paper, we combine all these
approaches. Such a triple synergy did not exist before for variational
inequalities and saddle problems, nor even for minimization problems. The
methods presented in this paper have the best theoretical guarantees of
communication complexity and are significantly ahead of other methods for
distributed variational inequalities. The theoretical results are confirmed by
adversarial learning experiments on synthetic and real datasets.
( 2
min )
We prove that various stochastic gradient descent methods, including the
stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic
Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict
saddle manifold. To the best of our knowledge, this is the first time such
results are obtained for SHB and SNAG methods. Moreover, our analysis expands
upon previous studies on SGD by removing the need for bounded gradients of the
objective function and uniformly bounded noise. Instead, we introduce a more
practical local boundedness assumption for the noisy gradient, which is
naturally satisfied in empirical risk minimization problems typically seen in
training of neural networks.
( 2
min )
Mitigating the discrimination of machine learning models has gained
increasing attention in medical image analysis. However, rare works focus on
fair treatments for patients with multiple sensitive demographic ones, which is
a crucial yet challenging problem for real-world clinical applications. In this
paper, we propose a novel method for fair representation learning with respect
to multi-sensitive attributes. We pursue the independence between target and
multi-sensitive representations by achieving orthogonality in the
representation space. Concretely, we enforce the column space orthogonality by
keeping target information on the complement of a low-rank sensitive space.
Furthermore, in the row space, we encourage feature dimensions between target
and sensitive representations to be orthogonal. The effectiveness of the
proposed method is demonstrated with extensive experiments on the CheXpert
dataset. To our best knowledge, this is the first work to mitigate unfairness
with respect to multiple sensitive attributes in the field of medical imaging.
( 2
min )
We present a new convolution layer for deep learning architectures which we
call QuadConv -- an approximation to continuous convolution via quadrature. Our
operator is developed explicitly for use on non-uniform, mesh-based data, and
accomplishes this by learning a continuous kernel that can be sampled at
arbitrary locations. Moreover, the construction of our operator admits an
efficient implementation which we detail and construct. In the setting of
compressing data arising from partial differential equation (PDE) simulations,
we show that QuadConv can match the performance of standard discrete
convolutions on uniform grid data by comparing a QuadConv autoencoder (QCAE) to
a standard convolutional autoencoder (CAE). Further, we show that the QCAE can
maintain this accuracy even on non-uniform data.
( 2
min )
A current goal in the graph neural network literature is to enable
transformers to operate on graph-structured data, given their success on
language and vision tasks. Since the transformer's original sinusoidal
positional encodings (PEs) are not applicable to graphs, recent work has
focused on developing graph PEs, rooted in spectral graph theory or various
spatial features of a graph. In this work, we introduce a new graph PE, Graph
Automaton PE (GAPE), based on weighted graph-walking automata (a novel
extension of graph-walking automata). We compare the performance of GAPE with
other PE schemes on both machine translation and graph-structured tasks, and we
show that it generalizes several other PEs. An additional contribution of this
study is a theoretical and controlled experimental comparison of many recent
PEs in graph transformers, independent of the use of edge features.
( 2
min )
Molecular conformation generation (MCG) is a fundamental and important
problem in drug discovery. Many traditional methods have been developed to
solve the MCG problem, such as systematic searching, model-building, random
searching, distance geometry, molecular dynamics, Monte Carlo methods, etc.
However, they have some limitations depending on the molecular structures.
Recently, there are plenty of deep learning based MCG methods, which claim they
largely outperform the traditional methods. However, to our surprise, we design
a simple and cheap algorithm (parameter-free) based on the traditional methods
and find it is comparable to or even outperforms deep learning based MCG
methods in the widely used GEOM-QM9 and GEOM-Drugs benchmarks. In particular,
our design algorithm is simply the clustering of the RDKIT-generated
conformations. We hope our findings can help the community to revise the deep
learning methods for MCG. The code of the proposed algorithm could be found at
https://gist.github.com/ZhouGengmo/5b565f51adafcd911c0bc115b2ef027c.
( 2
min )
Contrastive learning is a powerful framework for learning self-supervised
representations that generalize well to downstream supervised tasks. We show
that multiple existing contrastive learning methods can be reinterpreted as
learning kernel functions that approximate a fixed positive-pair kernel. We
then prove that a simple representation obtained by combining this kernel with
PCA provably minimizes the worst-case approximation error of linear predictors,
under a straightforward assumption that positive pairs have similar labels. Our
analysis is based on a decomposition of the target function in terms of the
eigenfunctions of a positive-pair Markov chain, and a surprising equivalence
between these eigenfunctions and the output of Kernel PCA. We give
generalization bounds for downstream linear prediction using our Kernel PCA
representation, and show empirically on a set of synthetic tasks that applying
Kernel PCA to contrastive learning models can indeed approximately recover the
Markov chain eigenfunctions, although the accuracy depends on the kernel
parameterization as well as on the augmentation strength.
( 2
min )
Tongue twisters are meaningful sentences that are difficult to pronounce. The
process of automatically generating tongue twisters is challenging since the
generated utterance must satisfy two conditions at once: phonetic difficulty
and semantic meaning. Furthermore, phonetic difficulty is itself hard to
characterize and is expressed in natural tongue twisters through a
heterogeneous mix of phenomena such as alliteration and homophony. In this
paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue
Twisters Automatically. We leverage phoneme representations to capture the
notion of phonetic difficulty, and we train language models to generate
original tongue twisters on two proposed task settings. To do this, we curate a
dataset called PANCETTA, consisting of existing English tongue twisters.
Through automatic and human evaluation, as well as qualitative analysis, we
show that PANCETTA generates novel, phonetically difficult, fluent, and
semantically meaningful tongue twisters.
( 2
min )
The Baum-Welch (B-W) algorithm is the most widely accepted method for
inferring hidden Markov models (HMM). However, it is prone to getting stuck in
local optima, and can be too slow for many real-time applications. Spectral
learning of HMMs (SHMMs), based on the method of moments (MOM) has been
proposed in the literature to overcome these obstacles. Despite its promises,
asymptotic theory for SHMM has been elusive, and the long-run performance of
SHMM can degrade due to unchecked propogation of error. In this paper, we (1)
provide an asymptotic distribution for the approximate error of the likelihood
estimated by SHMM, and (2) propose a novel algorithm called projected SHMM
(PSHMM) that mitigates the problem of error propogation, and (3) develop online
learning variantions of both SHMM and PSHMM that accommodate potential
nonstationarity. We compare the performance of SHMM with PSHMM and estimation
through the B-W algorithm on both simulated data and data from real world
applications, and find that PSHMM not only retains the computational advantages
of SHMM, but also provides more robust estimation and forecasting.
( 2
min )
Arunachalam and De Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
We propose to explore the potential of physics-informed neural networks
(PINNs) in solving a class of partial differential equations (PDEs) used to
model the propagation of chronic inflammatory bowel diseases, such as Crohn's
disease and ulcerative colitis. An unsupervised approach was privileged during
the deep neural network training. Given the complexity of the underlying
biological system, characterized by intricate feedback loops and limited
availability of high-quality data, the aim of this study is to explore the
potential of PINNs in solving PDEs. In addition to providing this exploratory
assessment, we also aim to emphasize the principles of reproducibility and
transparency in our approach, with a specific focus on ensuring the robustness
and generalizability through the use of artificial intelligence. We will
quantify the relevance of the PINN method with several linear and non-linear
PDEs in relation to biology. However, it is important to note that the final
solution is dependent on the initial conditions, chosen boundary conditions,
and neural network architectures.
( 2
min )
Gamma-Phi losses constitute a family of multiclass classification loss
functions that generalize the logistic and other common losses, and have found
application in the boosting literature. We establish the first general
sufficient condition for the classification-calibration of such losses. In
addition, we show that a previously proposed sufficient condition is in fact
not sufficient.
( 2
min )
The Baum-Welch (B-W) algorithm is the most widely accepted method for
inferring hidden Markov models (HMM). However, it is prone to getting stuck in
local optima, and can be too slow for many real-time applications. Spectral
learning of HMMs (SHMMs), based on the method of moments (MOM) has been
proposed in the literature to overcome these obstacles. Despite its promises,
asymptotic theory for SHMM has been elusive, and the long-run performance of
SHMM can degrade due to unchecked propogation of error. In this paper, we (1)
provide an asymptotic distribution for the approximate error of the likelihood
estimated by SHMM, and (2) propose a novel algorithm called projected SHMM
(PSHMM) that mitigates the problem of error propogation, and (3) develop online
learning variantions of both SHMM and PSHMM that accommodate potential
nonstationarity. We compare the performance of SHMM with PSHMM and estimation
through the B-W algorithm on both simulated data and data from real world
applications, and find that PSHMM not only retains the computational advantages
of SHMM, but also provides more robust estimation and forecasting.
( 2
min )
Dimensionality reduction (DR) plays a vital role in the visual analysis of
high-dimensional data. One main aim of DR is to reveal hidden patterns that lie
on intrinsic low-dimensional manifolds. However, DR often overlooks important
patterns when the manifolds are distorted or masked by certain influential data
attributes. This paper presents a feature learning framework, FEALM, designed
to generate a set of optimized data projections for nonlinear DR in order to
capture important patterns in the hidden manifolds. These projections produce
maximally different nearest-neighbor graphs so that resultant DR outcomes are
significantly different. To achieve such a capability, we design an
optimization algorithm as well as introduce a new graph dissimilarity measure,
named neighbor-shape dissimilarity. Additionally, we develop interactive
visualizations to assist comparison of obtained DR results and interpretation
of each DR result. We demonstrate FEALM's effectiveness through experiments and
case studies using synthetic and real-world datasets.
( 2
min )
Variational inequalities are a broad and flexible class of problems that
includes minimization, saddle point, fixed point problems as special cases.
Therefore, variational inequalities are used in a variety of applications
ranging from equilibrium search to adversarial learning. Today's realities with
the increasing size of data and models demand parallel and distributed
computing for real-world machine learning problems, most of which can be
represented as variational inequalities. Meanwhile, most distributed approaches
has a significant bottleneck - the cost of communications. The three main
techniques to reduce both the total number of communication rounds and the cost
of one such round are the use of similarity of local functions, compression of
transmitted information and local updates. In this paper, we combine all these
approaches. Such a triple synergy did not exist before for variational
inequalities and saddle problems, nor even for minimization problems. The
methods presented in this paper have the best theoretical guarantees of
communication complexity and are significantly ahead of other methods for
distributed variational inequalities. The theoretical results are confirmed by
adversarial learning experiments on synthetic and real datasets.
( 2
min )
We consider the optimal sample complexity theory of tabular reinforcement
learning (RL) for controlling the infinite horizon discounted reward in a
Markov decision process (MDP). Optimal min-max complexity results have been
developed for tabular RL in this setting, leading to a sample complexity
dependence on $\gamma$ and $\epsilon$ of the form $\tilde
\Theta((1-\gamma)^{-3}\epsilon^{-2})$, where $\gamma$ is the discount factor
and $\epsilon$ is the tolerance solution error. However, in many applications
of interest, the optimal policy (or all policies) will induce mixing. We show
that in these settings the optimal min-max complexity is $\tilde
\Theta(t_{\text{minorize}}(1-\gamma)^{-2}\epsilon^{-2})$, where
$t_{\text{minorize}}$ is a measure of mixing that is within an equivalent
factor of the total variation mixing time. Our analysis is based on
regeneration-type ideas, that, we believe are of independent interest since
they can be used to study related problems for general state space MDPs.
( 2
min )
Semi-supervised learning is a powerful technique for leveraging unlabeled
data to improve machine learning models, but it can be affected by the presence
of ``informative'' labels, which occur when some classes are more likely to be
labeled than others. In the missing data literature, such labels are called
missing not at random. In this paper, we propose a novel approach to address
this issue by estimating the missing-data mechanism and using inverse
propensity weighting to debias any SSL algorithm, including those using data
augmentation. We also propose a likelihood ratio test to assess whether or not
labels are indeed informative. Finally, we demonstrate the performance of the
proposed methods on different datasets, in particular on two medical datasets
for which we design pseudo-realistic missing data scenarios.
( 2
min )
In this paper, we propose a model-free feature selection method for
ultra-high dimensional data with mass features. This is a two phases procedure
that we propose to use the fused Kolmogorov filter with the random forest based
RFE to remove model limitations and reduce the computational complexity. The
method is fully nonparametric and can work with various types of datasets. It
has several appealing characteristics, i.e., accuracy, model-free, and
computational efficiency, and can be widely used in practical problems, such as
multiclass classification, nonparametric regression, and Poisson regression,
among others. We show that the proposed method is selection consistent and
$L_2$ consistent under weak regularity conditions. We further demonstrate the
superior performance of the proposed method over other existing methods by
simulations and real data examples.
( 2
min )
Arunachalam and De Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
Gamma-Phi losses constitute a family of multiclass classification loss
functions that generalize the logistic and other common losses, and have found
application in the boosting literature. We establish the first general
sufficient condition for the classification-calibration of such losses. In
addition, we show that a previously proposed sufficient condition is in fact
not sufficient.
( 2
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/LazyHighGoals
[link] [comments]
( 41
min )
submitted by /u/red3vil96
[link] [comments]
( 41
min )
submitted by /u/Blake_Jonesy
[link] [comments]
( 41
min )
submitted by /u/LorestForest
[link] [comments]
( 42
min )
submitted by /u/ThatManulTheCat
[link] [comments]
( 40
min )
submitted by /u/Number_5_alive
[link] [comments]
( 40
min )
submitted by /u/Calatravo
[link] [comments]
( 40
min )
submitted by /u/citizentim
[link] [comments]
( 40
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 41
min )
Hi everyone, I used several machine vision algorithms to determine the fastest lane on border crossings. I have worked on this for the past few months and would love to know what you think about it. You can check out the detailed steps and code on the medium article in this link.
submitted by /u/andrea_m2000
[link] [comments]
( 41
min )
submitted by /u/globeworldmap
[link] [comments]
( 41
min )
submitted by /u/punkthesystem
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/Meta-Stark
[link] [comments]
( 40
min )
submitted by /u/SAT0725
[link] [comments]
( 44
min )
submitted by /u/fignewtgingrich
[link] [comments]
( 42
min )
submitted by /u/TrainingExtent8699
[link] [comments]
( 40
min )
submitted by /u/Chipdoc
[link] [comments]
( 42
min )
Modern model pre-training often calls for larger cluster deployment to reduce time and cost. At the server level, such training workloads demand faster compute and increased memory allocation. As models grow to hundreds of billions of parameters, they require a distributed training mechanism that spans multiple nodes (instances). In October 2022, we launched Amazon EC2 […]
( 10
min )
Enterprises across the globe are looking to utilize multiple data sources to implement a unified search experience for their employees and end customers. Considering the large volume of data that needs to be examined and indexed, the retrieval speed, solution scalability, and search performance become key factors to consider when choosing an enterprise intelligent search […]
( 7
min )
Novel AI technologies are generating images, stories and, now, new ways to imagine the automotive future. At NVIDIA GTC, a global conference for the era of AI and the metaverse running online March 20-23, industry luminaries working on these breakthroughs will come together and share their visions to transform transportation. This year’s slate of in-depth Read article >
( 5
min )
The video above represents one of the first times that a pangolin, one of the world’s most critically endangered species, was detected in real time using artificial intelligence. A U.K.-based nonprofit called Conservation AI made this possible with the help of NVIDIA technology. Such use of AI can help track even the rarest, most reclusive Read article >
( 7
min )
Fellow Hunters, get ready! This GFN Thursday welcomes Capcom’s Monster Hunter Rise and the expansion Sunbreak to the cloud, arriving soon for members. Settle down for the weekend with 10 new games supported in the GeForce NOW library, including The Settlers: New Allies. Plus, Amsterdam and Ashburn are next to light up on the RTX Read article >
( 5
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
We’re clarifying how ChatGPT's behavior is shaped and our plans for improving that behavior, allowing more user customization, and getting more public input into our decision-making in these areas.
OpenAI’s mission is to ensure that artificial general intelligence (AGI)[1] benefits all of humanity.
( 6
min )
The goal of this paper is to make a strong point for the usage of dynamical
models when using reinforcement learning (RL) for feedback control of dynamical
systems governed by partial differential equations (PDEs). To breach the gap
between the immense promises we see in RL and the applicability in complex
engineering systems, the main challenges are the massive requirements in terms
of the training data, as well as the lack of performance guarantees. We present
a solution for the first issue using a data-driven surrogate model in the form
of a convolutional LSTM with actuation. We demonstrate that learning an
actuated model in parallel to training the RL agent significantly reduces the
total amount of required data sampled from the real system. Furthermore, we
show that iteratively updating the model is of major importance to avoid biases
in the RL training. Detailed ablation studies reveal the most important
ingredients of the modeling process. We use the chaotic Kuramoto-Sivashinsky
equation do demonstarte our findings.
( 2
min )
Due to the precautionary measures during the COVID-19 pandemic many
universities offered unproctored take-home exams. We propose methods to detect
potential collusion between students and apply our approach on event log data
from take-home exams during the pandemic. We find groups of students with
suspiciously similar exams. In addition, we compare our findings to a proctored
control group. By this, we establish a rule of thumb for evaluating which cases
are "outstandingly similar", i.e., suspicious cases.
( 2
min )
Recent applications of pattern recognition techniques on brain connectome
classification using functional connectivity (FC) neglect the non-Euclidean
topology and causal dynamics of brain connectivity across time. In this paper,
a deep probabilistic spatiotemporal framework developed based on variational
Bayes (DSVB) is proposed to learn time-varying topological structures in
dynamic brain FC networks for autism spectrum disorder (ASD) identification.
The proposed framework incorporates a spatial-aware recurrent neural network to
capture rich spatiotemporal patterns across dynamic FC networks, followed by a
fully-connected neural network to exploit these learned patterns for
subject-level classification. To overcome model overfitting on limited training
datasets, an adversarial training strategy is introduced to learn graph
embedding models that generalize well to unseen brain networks. Evaluation on
the ABIDE resting-state functional magnetic resonance imaging dataset shows
that our proposed framework significantly outperformed state-of-the-art methods
in identifying ASD. Dynamic FC analyses with DSVB learned embeddings reveal
apparent group difference between ASD and healthy controls in network profiles
and switching dynamics of brain states.
( 2
min )
Most works on the fairness of machine learning systems focus on the blind
optimization of common fairness metrics, such as Demographic Parity and
Equalized Odds. In this paper, we conduct a comparative study of several bias
mitigation approaches to investigate their behaviors at a fine grain, the
prediction level. Our objective is to characterize the differences between fair
models obtained with different approaches. With comparable performances in
fairness and accuracy, are the different bias mitigation approaches impacting a
similar number of individuals? Do they mitigate bias in a similar way? Do they
affect the same individuals when debiasing a model? Our findings show that bias
mitigation approaches differ a lot in their strategies, both in the number of
impacted individuals and the populations targeted. More surprisingly, we show
these results even apply for several runs of the same mitigation approach.
These findings raise questions about the limitations of the current group
fairness metrics, as well as the arbitrariness, hence unfairness, of the whole
debiasing process.
( 2
min )
The existence of external (``side'') semantic knowledge has been shown to
result in more expressive computational event models. To enable the use of side
information that may be noisy or missing, we propose a semi-supervised
information bottleneck-based discrete latent variable model. We reparameterize
the model's discrete variables with auxiliary continuous latent variables and a
light-weight hierarchical structure. Our model is learned to minimize the
mutual information between the observed data and optional side knowledge that
is not already captured by the new, auxiliary variables. We theoretically show
that our approach generalizes past approaches, and perform an empirical case
study of our approach on event modeling. We corroborate our theoretical results
with strong empirical experiments, showing that the proposed method outperforms
previous proposed approaches on multiple datasets.
( 2
min )
This paper examines the separation of wireless communication and radar
signals, thereby guaranteeing cohabitation and acting as a panacea to spectrum
sensing. First, considering that the channel impulse response was known by the
receivers (communication and radar), we showed that the optimizing beamforming
weights mitigate the interference caused by signals and improve the physical
layer security (PLS) of the system. Furthermore, when the channel responses
were unknown, we designed an interference filter as a low-complex noise and
interference cancellation autoencoder. By mitigating the interference on the
legitimate users, the PLS was guaranteed. Results showed that even for a low
signal-to-noise ratio, the autoencoder produces low root-mean-square error
(RMSE) values.
( 2
min )
Intrigued by the claims of emergent reasoning capabilities in LLMs trained on
general web corpora, in this paper, we set out to investigate their planning
capabilities. We aim to evaluate (1) how good LLMs are by themselves in
generating and validating simple plans in commonsense planning tasks (of the
type that humans are generally quite good at) and (2) how good LLMs are in
being a source of heuristic guidance for other agents--either AI planners or
human planners--in their planning tasks. To investigate these questions in a
systematic rather than anecdotal manner, we start by developing a benchmark
suite based on the kinds of domains employed in the International Planning
Competition. On this benchmark, we evaluate LLMs in three modes: autonomous,
heuristic and human-in-the-loop. Our results show that LLM's ability to
autonomously generate executable plans is quite meager, averaging only about 3%
success rate. The heuristic and human-in-the-loop modes show slightly more
promise. In addition to these results, we also make our benchmark and
evaluation tools available to support investigations by research community.
( 2
min )
Artificial neural networks are being proposed as models of parts of the
brain. The networks are compared to recordings of biological neurons, and good
performance in reproducing neural responses is considered to support the
model's validity. A key question is how much this system identification
approach tells us about brain computation. Does it validate one model
architecture over another? We evaluate the most commonly used comparison
techniques, such as a linear encoding model and centered kernel alignment, to
correctly identify a model by replacing brain recordings with known ground
truth models. System identification performance is quite variable; it also
depends significantly on factors independent of the ground truth architecture,
such as stimuli images. In addition, we show the limitations of using
functional similarity scores in identifying higher-level architectural motifs.
( 2
min )
Bilevel Optimization has witnessed notable progress recently with new
emerging efficient algorithms, yet it is underexplored in the Federated
Learning setting. It is unclear how the challenges of Federated Learning affect
the convergence of bilevel algorithms. In this work, we study Federated Bilevel
Optimization problems. We first propose the FedBiO algorithm that solves the
hyper-gradient estimation problem efficiently, then we propose FedBiOAcc to
accelerate FedBiO. FedBiO has communication complexity $O(\epsilon^{-1.5})$
with linear speed up, while FedBiOAcc achieves communication complexity
$O(\epsilon^{-1})$, sample complexity $O(\epsilon^{-1.5})$ and also the linear
speed up. We also study Federated Bilevel Optimization problems with local
lower level problems, and prove that FedBiO and FedBiOAcc converges at the same
rate with some modification.
( 2
min )
Sequential monitoring of high-dimensional nonlinear time series is studied
for a projection of the second-moment matrix, a problem interesting in its own
right and specifically arising in finance and deep learning. Open-end as well
as closed-end monitoring is studied under mild assumptions on the training
sample and the observations of the monitoring period. Asymptotics is based on
Gaussian approximations of projected partial sums allowing for an estimated
projection vector. Estimation is studied both for classical
non-$\ell_0$-sparsity as well as under sparsity. For the case that the optimal
projection depends on the unknown covariance matrix, hard- and soft-thresholded
estimators are studied. Applications in finance and training of deep neural
networks are discussed. The proposed detectors typically allow to reduce
dramatically the required computational costs as illustrated by monitoring
synthetic data.
( 2
min )
We introduce a boosting algorithm to pre-process data for fairness. Starting
from an initial fair but inaccurate distribution, our approach shifts towards
better data fitting while still ensuring a minimal fairness guarantee. To do
so, it learns the sufficient statistics of an exponential family with
boosting-compliant convergence. Importantly, we are able to theoretically prove
that the learned distribution will have a representation rate and statistical
rate data fairness guarantee. Unlike recent optimization based pre-processing
methods, our approach can be easily adapted for continuous domain features.
Furthermore, when the weak learners are specified to be decision trees, the
sufficient statistics of the learned distribution can be examined to provide
clues on sources of (un)fairness. Empirical results are present to display the
quality of result on real-world data.
( 2
min )
Energy efficient navigation constitutes an important challenge in electric
vehicles, due to their limited battery capacity. We employ a Bayesian approach
to model the energy consumption at road segments for efficient navigation. In
order to learn the model parameters, we develop an online learning framework
and investigate several exploration strategies such as Thompson Sampling and
Upper Confidence Bound. We then extend our online learning framework to the
multi-agent setting, where multiple vehicles adaptively navigate and learn the
parameters of the energy model. We analyze Thompson Sampling and establish
rigorous regret bounds on its performance in the single-agent and multi-agent
settings, through an analysis of the algorithm under batched feedback. Finally,
we demonstrate the performance of our methods via experiments on several
real-world city road networks.
( 2
min )
We prove that the Minimum Description Length learning rule exhibits tempered
overfitting. We obtain tempered agnostic finite sample learning guarantees and
characterize the asymptotic behavior in the presence of random label noise.
( 2
min )
We study the convergence rate of discretized Riemannian Hamiltonian Monte
Carlo on sampling from distributions in the form of $e^{-f(x)}$ on a convex
body $\mathcal{M}\subset\mathbb{R}^{n}$. We show that for distributions in the
form of $e^{-\alpha^{\top}x}$ on a polytope with $m$ constraints, the
convergence rate of a family of commonly-used integrators is independent of
$\left\Vert \alpha\right\Vert _{2}$ and the geometry of the polytope. In
particular, the implicit midpoint method (IMM) and the generalized Leapfrog
method (LM) have a mixing time of $\widetilde{O}\left(mn^{3}\right)$ to achieve
$\epsilon$ total variation distance to the target distribution. These
guarantees are based on a general bound on the convergence rate for densities
of the form $e^{-f(x)}$ in terms of parameters of the manifold and the
integrator. Our theoretical guarantee complements the empirical results of
[KLSV22], which shows that RHMC with IMM can sample ill-conditioned, non-smooth
and constrained distributions in very high dimension efficiently in practice.
( 2
min )
Monotonic linear interpolation (MLI) - on the line connecting a random
initialization with the minimizer it converges to, the loss and accuracy are
monotonic - is a phenomenon that is commonly observed in the training of neural
networks. Such a phenomenon may seem to suggest that optimization of neural
networks is easy. In this paper, we show that the MLI property is not
necessarily related to the hardness of optimization problems, and empirical
observations on MLI for deep neural networks depend heavily on biases. In
particular, we show that interpolating both weights and biases linearly leads
to very different influences on the final output, and when different classes
have different last-layer biases on a deep network, there will be a long
plateau in both the loss and accuracy interpolation (which existing theory of
MLI cannot explain). We also show how the last-layer biases for different
classes can be different even on a perfectly balanced dataset using a simple
model. Empirically we demonstrate that similar intuitions hold on practical
networks and realistic datasets.
( 2
min )
In this paper, we interpret disentanglement as the discovery of local charts
and trace how that definition naturally leads to an equivalent condition for
disentanglement: the disentangled factors must commute with each other. We
discuss the practical and theoretical implications of commutativity, in
particular the compression and disentanglement of generative models. Finally,
we conclude with a discussion of related approaches to disentanglement and how
they relate to our view of disentanglement from the manifold perspective.
( 2
min )
We introduce a method for embedding graphs as vectors in a
structure-preserving manner, showcasing its rich representational capacity and
giving some theoretical properties. Our procedure falls under the bind-and-sum
approach, and we show that our binding operation - the tensor product - is the
most general binding operation that respects the principle of superposition. We
also establish some precise results characterizing the behavior of our method,
and we show that our use of spherical codes achieves a packing upper bound.
Then, we perform experiments showcasing our method's accuracy in various graph
operations even when the number of edges is quite large. Finally, we establish
a link to adjacency matrices, showing that our method is, in some sense, a
generalization of adjacency matrices with applications towards large sparse
graphs.
( 2
min )
In this paper, we study bottleneck identification in networks via extracting
minimax paths. Many real-world networks have stochastic weights for which full
knowledge is not available in advance. Therefore, we model this task as a
combinatorial semi-bandit problem to which we apply a combinatorial version of
Thompson Sampling and establish an upper bound on the corresponding Bayesian
regret. Due to the computational intractability of the problem, we then devise
an alternative problem formulation which approximates the original objective.
Finally, we experimentally evaluate the performance of Thompson Sampling with
the approximate formulation on real-world directed and undirected networks.
( 2
min )
Data pruning algorithms are commonly used to reduce the memory and
computational cost of the optimization process. Recent empirical results reveal
that random data pruning remains a strong baseline and outperforms most
existing data pruning methods in the high compression regime, i.e., where a
fraction of $30\%$ or less of the data is kept. This regime has recently
attracted a lot of interest as a result of the role of data pruning in
improving the so-called neural scaling laws; in [Sorscher et al.], the authors
showed the need for high-quality data pruning algorithms in order to beat the
sample power law.
In this work, we focus on score-based data pruning algorithms and show
theoretically and empirically why such algorithms fail in the high compression
regime. We demonstrate ``No Free Lunch" theorems for data pruning and present
calibration protocols that enhance the performance of existing pruning
algorithms in this high compression regime using randomization.
( 2
min )
Diffusion models achieve state-of-the-art performance in various generation
tasks. However, their theoretical foundations fall far behind. This paper
studies score approximation, estimation, and distribution recovery of diffusion
models, when data are supported on an unknown low-dimensional linear subspace.
Our result provides sample complexity bounds for distribution estimation using
diffusion models. We show that with a properly chosen neural network
architecture, the score function can be both accurately approximated and
efficiently estimated. Furthermore, the generated distribution based on the
estimated score function captures the data geometric structures and converges
to a close vicinity of the data distribution. The convergence rate depends on
the subspace dimension, indicating that diffusion models can circumvent the
curse of data ambient dimensionality.
( 2
min )
We propose new limiting dynamics for stochastic gradient descent in the small
learning rate regime called stochastic modified flows. These SDEs are driven by
a cylindrical Brownian motion and improve the so-called stochastic modified
equations by having regular diffusion coefficients and by matching the
multi-point statistics. As a second contribution, we introduce distribution
dependent stochastic modified flows which we prove to describe the fluctuating
limiting dynamics of stochastic gradient descent in the small learning rate -
infinite width scaling regime.
( 2
min )
Most works on the fairness of machine learning systems focus on the blind
optimization of common fairness metrics, such as Demographic Parity and
Equalized Odds. In this paper, we conduct a comparative study of several bias
mitigation approaches to investigate their behaviors at a fine grain, the
prediction level. Our objective is to characterize the differences between fair
models obtained with different approaches. With comparable performances in
fairness and accuracy, are the different bias mitigation approaches impacting a
similar number of individuals? Do they mitigate bias in a similar way? Do they
affect the same individuals when debiasing a model? Our findings show that bias
mitigation approaches differ a lot in their strategies, both in the number of
impacted individuals and the populations targeted. More surprisingly, we show
these results even apply for several runs of the same mitigation approach.
These findings raise questions about the limitations of the current group
fairness metrics, as well as the arbitrariness, hence unfairness, of the whole
debiasing process.
( 2
min )
We study the problem of discrete distribution estimation in KL divergence and
provide concentration bounds for the Laplace estimator. We show that the
deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the
best prior result of $k/n$. We also establish a matching lower bound that shows
that our bounds are tight up to polylogarithmic factors.
( 2
min )
Machine-learned coarse-grained (CG) models have the potential for simulating
large molecular complexes beyond what is possible with atomistic molecular
dynamics. However, training accurate CG models remains a challenge. A widely
used methodology for learning CG force-fields maps forces from all-atom
molecular dynamics to the CG representation and matches them with a CG
force-field on average. We show that there is flexibility in how to map
all-atom forces to the CG representation, and that the most commonly used
mapping methods are statistically inefficient and potentially even incorrect in
the presence of constraints in the all-atom simulation. We define an
optimization statement for force mappings and demonstrate that substantially
improved CG force-fields can be learned from the same simulation data when
using optimized force maps. The method is demonstrated on the miniproteins
Chignolin and Tryptophan Cage and published as open-source code.
( 2
min )
Hyperbolic spaces have been quite popular in the recent past for representing
hierarchically organized data. Further, several classification algorithms for
data in these spaces have been proposed in the literature. These algorithms
mainly use either hyperplanes or geodesics for decision boundaries in a large
margin classifiers setting leading to a non-convex optimization problem. In
this paper, we propose a novel large margin classifier based on horocycle
(horosphere) decision boundaries that leads to a geodesically convex
optimization problem that can be optimized using any Riemannian gradient
descent technique guaranteeing a globally optimal solution. We present several
experiments depicting the performance of our classifier.
( 2
min )
submitted by /u/Calatravo
[link] [comments]
( 40
min )
Hello everyone. I am a software engineering assistant professor at a private university. I have got lots of older lecture videos on my channel.
I am using NVIDIA broadcast to remove noise and it works very well.
However, I want to improve audio quality as well.
After doing a lot of research I found that audio super-resolution is the way to go
The only github repo I have found so far not working
Any help is appreciated
How can I improve speech quality?
Here my example lecture video (noise removed already - reuploaded - but sound is not good)
C# Programming For Beginners - Lecture 2: Coding our First Application in .NET Core Console
https://youtu.be/XLsrsCCdSnU
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/cryfi
[link] [comments]
( 42
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 41
min )
submitted by /u/Tiege
[link] [comments]
( 40
min )
submitted by /u/Groudon466
[link] [comments]
( 40
min )
submitted by /u/arnolds112
[link] [comments]
( 40
min )
submitted by /u/Dalembert
[link] [comments]
( 41
min )
submitted by /u/TheMysteriousMrM
[link] [comments]
( 40
min )
submitted by /u/BackgroundResult
[link] [comments]
( 41
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 41
min )
submitted by /u/Risz1
[link] [comments]
( 41
min )
submitted by /u/alotmorealots
[link] [comments]
( 45
min )
submitted by /u/CeFurkan
[link] [comments]
( 41
min )
Hello everyone. I am a software engineering assistant professor at a private university. I have got lots of older lecture videos on my channel.
I am using NVIDIA broadcast to remove noise and it works very well.
However, I want to improve audio quality as well.
After doing a lot of research I found that audio super-resolution is the way to go
The only github repo I have found so far not working
Any help is appreciated
How can I improve speech quality?
Here my example lecture video (noise removed already - reuploaded - but sound is not good)
C# Programming For Beginners - Lecture 2: Coding our First Application in .NET Core Console
https://youtu.be/XLsrsCCdSnU
submitted by /u/CeFurkan
[link] [comments]
( 42
min )
Amazon SageMaker JumpStart is the machine learning (ML) hub of SageMaker that offers over 350 built-in algorithms, pre-trained models, and pre-built solution templates to help you get started with ML fast. JumpStart provides one-click access to a wide variety of pre-trained models for common ML tasks such as object detection, text classification, summarization, text generation […]
( 11
min )
submitted by /u/robotphilanthropist
[link] [comments]
( 41
min )
Here is a podcast episode with Noam Brown from Meta AI where we discuss his work on achieving human-level performance on poker and Diplomacy, as well as the power of spending compute at inference time!
submitted by /u/thejashGI
[link] [comments]
( 41
min )
AI-augmented applications, photorealistic rendering, simulation and other technologies are helping professionals achieve business-critical results from multi-app workflows faster than ever. Running these data-intensive, complex workflows, as well as sharing data and collaborating across geographically dispersed teams, requires workstations with high-end CPUs, GPUs and advanced networking. To help meet these demands, Intel and NVIDIA are powering Read article >
( 6
min )
Whether creating realistic digital humans that can express emotion or building immersive virtual worlds, 3D artists can reach new heights with NVIDIA Omniverse, a platform for creating and operating metaverse applications. A new Blender alpha release, now available in the Omniverse Launcher, lets users of the 3D graphics software optimize scenes and streamline workflows with Read article >
( 5
min )
Surfers, swimmers and beachgoers face a hidden danger in the ocean: rip currents. These narrow channels of water can flow away from the shore at speeds up to 2.5 meters per second, making them one of the biggest safety risks for those enjoying the ocean. To help keep beachgoers safe, Christo Rautenbach, a coastal and Read article >
( 4
min )
One of the primary goals in spectrum occupancy mapping is to create a system
that is robust to assumptions about the number of sensors, occupancy threshold
(in dBm), sensor noise, number of emitters and the propagation environment. We
show that such a system may be designed with neural networks using a process of
aggregation to allow a variable number of sensors during training and testing.
This process transforms the variable number of measurements into approximate
log-likelihood ratios (LLRs), which are fed as a fixed-resolution image into a
neural network. The use of LLR's provides robustness to the effects of noise
and occupancy threshold. In other words, a system may be trained for a nominal
number of sensors, threshold and noise levels, and still operate well at
various other levels without retraining. Our system operates without knowledge
of the number of emitters and does not explicitly attempt to estimate their
number or power. Receiver operating curves with realistic propagation
environments using topographic maps with commercial network design tools show
how performance of the neural network varies with the environment. The use of
very low-resolution sensors in this system can still yield good performance.
( 2
min )
Q-learning and SARSA with $\epsilon$-greedy exploration are leading
reinforcement learning methods. Their tabular forms converge to the optimal
Q-function under reasonable conditions. However, with function approximation,
these methods exhibit strange behaviors such as policy oscillation, chattering,
and convergence to different attractors (possibly even the worst policy) on
different runs, apart from the usual instability. A theory to explain these
phenomena has been a long-standing open problem, even for basic linear function
approximation (Sutton, 1999). Our work uses differential inclusion to provide
the first framework for resolving this problem. We also provide numerical
examples to illustrate our framework's prowess in explaining these algorithms'
behaviors.
( 2
min )
Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential
Equation (SDE) has allowed researchers to enjoy the benefits of studying a
continuous optimization trajectory while carefully preserving the stochasticity
of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam,
has been challenging because there were no rigorously proven SDE approximations
for these methods. This paper derives the SDE approximations for RMSprop and
Adam, giving theoretical guarantees of their correctness as well as
experimental validation of their applicability to common large-scaling vision
and language settings. A key practical result is the derivation of a
$\textit{square root scaling rule}$ to adjust the optimization hyperparameters
of RMSprop and Adam when changing batch size, and its empirical validation in
deep learning settings.
( 2
min )
We provide a first finite-particle convergence rate for Stein variational
gradient descent (SVGD). Specifically, whenever the target distribution is
sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate
step size sequence drives the kernel Stein discrepancy to zero at an order
1/sqrt(log log n) rate. We suspect that the dependence on n can be improved,
and we hope that our explicit, non-asymptotic proof strategy will serve as a
template for future refinements.
( 2
min )
Despite the impressive successes of deep learning approaches for various
chemical problems such as property prediction, virtual screening, and de novo
molecule design, separately designed models for specific tasks are usually
required, and it is often difficult to synergistically combine these models for
novel tasks. To address this, here we present a bidirectional molecular
foundation model that can be used for both molecular structure and property
inferences through a single model, inspired by recent multimodal learning
methods such as VLP. Furthermore, thanks to the outstanding structure/property
alignment in a common embedding space, experimental results confirm that our
method leads to state-of-the-art performance and interpretable attention maps
in both multimodal and unimodal tasks, including conditional molecule
generation, property prediction, molecule classification, and reaction
prediction.
( 2
min )
We survey a current, heated debate in the AI research community on whether
large pre-trained language models can be said to "understand" language -- and
the physical and social situations language encodes -- in any important sense.
We describe arguments that have been made for and against such understanding,
and key questions for the broader sciences of intelligence that have arisen in
light of these arguments. We contend that a new science of intelligence can be
developed that will provide insight into distinct modes of understanding, their
strengths and limitations, and the challenge of integrating diverse forms of
cognition.
( 2
min )
A formal write-up of the simple proof (1995) of the existence of calibrated
forecasts by the minimax theorem, which moreover shows that $N^3$ periods
suffice to guarantee a calibration error of at most $1/N$.
( 2
min )
We present ASR Bundestag, a dataset for automatic speech recognition in
German, consisting of 610 hours of aligned audio-transcript pairs for
supervised training as well as 1,038 hours of unlabeled audio snippets for
self-supervised learning, based on raw audio data and transcriptions from
plenary sessions and committee meetings of the German parliament. In addition,
we discuss utilized approaches for the automated creation of speech datasets
and assess the quality of the resulting dataset based on evaluations and
finetuning of a pre-trained state of the art model. We make the dataset
publicly available, including all subsets.
( 2
min )
We propose a new \textit{quadratic programming-based} method of approximating
a nonstandard density using a multivariate Gaussian density. Such nonstandard
densities usually arise while developing posterior samplers for unobserved
components models involving inequality constraints on the parameters. For
instance, Chan et al. (2016) provided a new model of trend inflation with
linear inequality constraints on the stochastic trend. We implemented the
proposed quadratic programming-based method for this model and compared it to
the existing approximation. We observed that the proposed method works as well
as the existing approximation in terms of the final trend estimates while
achieving gains in terms of sample efficiency.
( 2
min )
We develop a new approach to drifting games, a class of two-person games with
many applications to boosting and online learning settings. Our approach
involves (a) guessing an asymptotically optimal potential by solving an
associated partial differential equation (PDE); then (b) justifying the guess,
by proving upper and lower bounds on the final-time loss whose difference
scales like a negative power of the number of time steps. The proofs of our
potential-based upper bounds are elementary, using little more than Taylor
expansion. The proofs of our potential-based lower bounds are also elementary,
combining Taylor expansion with probabilistic or combinatorial arguments. Not
only is our approach more elementary, but we give new potentials and derive
corresponding upper and lower bounds that match each other in the asymptotic
regime.
( 2
min )
We study the convergence rate of discretized Riemannian Hamiltonian Monte
Carlo on sampling from distributions in the form of $e^{-f(x)}$ on a convex
body $\mathcal{M}\subset\mathbb{R}^{n}$. We show that for distributions in the
form of $e^{-\alpha^{\top}x}$ on a polytope with $m$ constraints, the
convergence rate of a family of commonly-used integrators is independent of
$\left\Vert \alpha\right\Vert _{2}$ and the geometry of the polytope. In
particular, the implicit midpoint method (IMM) and the generalized Leapfrog
method (LM) have a mixing time of $\widetilde{O}\left(mn^{3}\right)$ to achieve
$\epsilon$ total variation distance to the target distribution. These
guarantees are based on a general bound on the convergence rate for densities
of the form $e^{-f(x)}$ in terms of parameters of the manifold and the
integrator. Our theoretical guarantee complements the empirical results of
[KLSV22], which shows that RHMC with IMM can sample ill-conditioned, non-smooth
and constrained distributions in very high dimension efficiently in practice.
( 2
min )
The COVID-19 pandemic has significantly impacted the construction sector,
which is sensitive to economic cycles. In order to boost value and efficiency
in this sector, the use of innovative exploration technologies such as
ultrasonic and Artificial Intelligence techniques in building material research
is becoming increasingly crucial. In this study, we developed two models for
predicting the Los Angeles (LA) and Micro Deval (MDE) coefficients, two
important geotechnical tests used to determine the quality of rock aggregates.
These coefficients describe the resistance of aggregates to fragmentation and
abrasion. The ultrasound velocity, porosity, and density of the rocks were
determined and used as inputs to develop prediction models using multiple
regression and an artificial neural network. These models may be used to assess
the quality of rock aggregates at the exploration stage without the need for
tedious laboratory analysis.
( 2
min )
Despite all the benefits of automated hyperparameter optimization (HPO), most
modern HPO algorithms are black-boxes themselves. This makes it difficult to
understand the decision process which leads to the selected configuration,
reduces trust in HPO, and thus hinders its broad adoption. Here, we study the
combination of HPO with interpretable machine learning (IML) methods such as
partial dependence plots. These techniques are more and more used to explain
the marginal effect of hyperparameters on the black-box cost function or to
quantify the importance of hyperparameters. However, if such methods are
naively applied to the experimental data of the HPO process in a post-hoc
manner, the underlying sampling bias of the optimizer can distort
interpretations. We propose a modified HPO method which efficiently balances
the search for the global optimum w.r.t. predictive performance \emph{and} the
reliable estimation of IML explanations of an underlying black-box function by
coupling Bayesian optimization and Bayesian Algorithm Execution. On benchmark
cases of both synthetic objectives and HPO of a neural network, we demonstrate
that our method returns more reliable explanations of the underlying black-box
without a loss of optimization performance.
( 2
min )
This manuscript investigates the one-pass stochastic gradient descent (SGD)
dynamics of a two-layer neural network trained on Gaussian data and labels
generated by a similar, though not necessarily identical, target function. We
rigorously analyse the limiting dynamics via a deterministic and
low-dimensional description in terms of the sufficient statistics for the
population risk. Our unifying analysis bridges different regimes of interest,
such as the classical gradient-flow regime of vanishing learning rate, the
high-dimensional regime of large input dimension, and the overparameterised
"mean-field" regime of large network width, covering as well the intermediate
regimes where the limiting dynamics is determined by the interplay between
these behaviours. In particular, in the high-dimensional limit, the
infinite-width dynamics is found to remain close to a low-dimensional subspace
spanned by the target principal directions. Our results therefore provide a
unifying picture of the limiting SGD dynamics with synthetic data.
( 2
min )
This paper empirically studies commonly observed training difficulties of
Physics-Informed Neural Networks (PINNs) on dynamical systems. Our results
indicate that fixed points which are inherent to these systems play a key role
in the optimization of the in PINNs embedded physics loss function. We observe
that the loss landscape exhibits local optima that are shaped by the presence
of fixed points. We find that these local optima contribute to the complexity
of the physics loss optimization which can explain common training difficulties
and resulting nonphysical predictions. Under certain settings, e.g., initial
conditions close to fixed points or long simulations times, we show that those
optima can even become better than that of the desired solution.
( 2
min )
We describe a parametrized space for simple meta-reinforcement-learning
(meta-RL) tasks with arbitrary stimuli. The parametrization allows us to
randomly generate an arbitrary number of novel simple meta-learning tasks. The
space of meta-RL tasks covered by this parametrization includes many well-known
meta-RL tasks, such as bandit tasks, the Harlow task, T-mazes, the Daw two-step
task and others. Simple extensions allow it to capture tasks based on
two-dimensional topological spaces, such as find-the-spot or key-door tasks. We
describe a number of randomly generated meta-RL tasks and discuss potential
issues arising from random generation.
( 2
min )
Advances in neural modeling have achieved state-of-the-art (SOTA) results on
public natural language processing (NLP) benchmarks, at times surpassing human
performance. However, there is a gap between public benchmarks and real-world
applications where noise, such as typographical or grammatical mistakes, is
abundant and can result in degraded performance. Unfortunately, works which
evaluate the robustness of neural models on noisy data and propose
improvements, are limited to the English language. Upon analyzing noise in
different languages, we observe that noise types vary greatly across languages.
Thus, existing investigations do not generalize trivially to multilingual
settings. To benchmark the performance of pretrained multilingual language
models, we construct noisy datasets covering five languages and four NLP tasks
and observe a clear gap in the performance between clean and noisy data in the
zero-shot cross-lingual setting. After investigating several ways to boost the
robustness of multilingual models in this setting, we propose Robust
Contrastive Pretraining (RCP). RCP combines data augmentation with a
contrastive loss term at the pretraining stage and achieves large improvements
on noisy (and original test data) across two sentence-level (+3.2%) and two
sequence-labeling (+10 F1-score) multilingual classification tasks.
( 2
min )
As advertisers increasingly shift their budgets toward digital advertising,
forecasting advertising costs is essential for making budget plans to optimize
marketing campaign returns. In this paper, we perform a comprehensive study
using a variety of time-series forecasting methods to predict daily average
cost-per-click (CPC) in the online advertising market. We show that forecasting
advertising costs would benefit from multivariate models using covariates from
competitors' CPC development identified through time-series clustering. We
further interpret the results by analyzing feature importance and temporal
attention. Finally, we show that our approach has several advantages over
models that individual advertisers might build based solely on their collected
data.
( 2
min )
We motivate and introduce CHARD: Clinical Health-Aware Reasoning across
Dimensions, to investigate the capability of text generation models to act as
implicit clinical knowledge bases and generate free-flow textual explanations
about various health-related conditions across several dimensions. We collect
and present an associated dataset, CHARDat, consisting of explanations about 52
health conditions across three clinical dimensions. We conduct extensive
experiments using BART and T5 along with data augmentation, and perform
automatic, human, and qualitative analyses. We show that while our models can
perform decently, CHARD is very challenging with strong potential for further
exploration.
( 2
min )
In addition to the weights of synaptic shared connections, PNN includes
weights of synaptic effective ranges [14-24]. PNN considers synaptic strength
balance in dynamic of phagocytosing of synapses and static of constant sum of
synapses length [14], and includes the lead behavior of the school of fish.
Synapse formation will inhibit dendrites generation to a certain extent in
experiments and PNN simulations [15]. The memory persistence gradient of
retrograde circuit similar to the Enforcing Resilience in a Spring Boot. The
relatively good and inferior gradient information stored in memory engram cells
in synapse formation of retrograde circuit like the folds of the brain [16].
The controversy was claimed if human hippocampal neurogenesis persists
throughout aging, PNN considered it may have a new and longer circuit in late
iteration [17,18]. Closing the critical period will cause neurological disorder
in experiments and PNN simulations [19]. Considering both negative and positive
memories persistence help activate synapse length changes with iterations
better than only considering positive memory [20]. Astrocytic phagocytosis will
avoid the local accumulation of synapses by simulation, Lack of astrocytic
phagocytosis causes excitatory synapses and functionally impaired synapses
accumulate in experiments and lead to destruction of cognition, but local
longer synapses and worse results in PNN simulations [21]. It gives
relationship of intelligence and cortical thickness, individual differences in
brain [22]. The PNN also considered the memory engram cells that strengthened
Synaptic strength [23]. The effects of PNN's memory structure and tPBM may be
the same for powerful penetrability of signals [24]. Memory persistence also
inhibit local synaptic accumulation. By PNN, it may introduce the relatively
good and inferior solution in PSO. The simple PNN only has the synaptic
phagocytosis.
( 3
min )
Reinforcement learning is an effective way to solve the decision-making
problems. It is a meaningful and valuable direction to investigate autonomous
air combat maneuver decision-making method based on reinforcement learning.
However, when using reinforcement learning to solve the decision-making
problems with sparse rewards, such as air combat maneuver decision-making, it
costs too much time for training and the performance of the trained agent may
not be satisfactory. In order to solve these problems, the method based on
curriculum learning is proposed. First, three curricula of air combat maneuver
decision-making are designed: angle curriculum, distance curriculum and hybrid
curriculum. These courses are used to train air combat agents respectively, and
compared with the original method without any curriculum. The training results
show that angle curriculum can increase the speed and stability of training,
and improve the performance of the agent; distance curriculum can increase the
speed and stability of agent training; hybrid curriculum has a negative impact
on training, because it makes the agent get stuck at local optimum. The
simulation results show that after training, the agent can handle the
situations where targets come from different directions, and the maneuver
decision results are consistent with the characteristics of missile.
( 2
min )
Traffic signal control is safety-critical for our daily life. Roughly
one-quarter of road accidents in the U.S. happen at intersections due to
problematic signal timing, urging the development of safety-oriented
intersection control. However, existing studies on adaptive traffic signal
control using reinforcement learning technologies have focused mainly on
minimizing traffic delay but neglecting the potential exposure to unsafe
conditions. We, for the first time, incorporate road safety standards as
enforcement to ensure the safety of existing reinforcement learning methods,
aiming toward operating intersections with zero collisions. We have proposed a
safety-enhanced residual reinforcement learning method (SafeLight) and employed
multiple optimization techniques, such as multi-objective loss function and
reward shaping for better knowledge integration. Extensive experiments are
conducted using both synthetic and real-world benchmark datasets. Results show
that our method can significantly reduce collisions while increasing traffic
mobility.
( 2
min )
This work studies discrete diffusion probabilistic models with applications
to natural language generation. We derive an alternative yet equivalent
formulation of the sampling from discrete diffusion processes and leverage this
insight to develop a family of reparameterized discrete diffusion models. The
derived generic framework is highly flexible, offers a fresh perspective of the
generation process in discrete diffusion models, and features more effective
training and decoding techniques. We conduct extensive experiments to evaluate
the text generation capability of our model, demonstrating significant
improvements over existing diffusion models.
( 2
min )
We consider the problem of learning multioutput function classes in batch and
online settings. In both settings, we show that a multioutput function class is
learnable if and only if each single-output restriction of the function class
is learnable. This provides a complete characterization of the learnability of
multilabel classification and multioutput regression in both batch and online
settings. As an extension, we also consider multilabel learnability in the
bandit feedback setting and show a similar characterization as in the
full-feedback setting.
( 2
min )
In this paper, we extend the Wiener-Ito chaos decomposition to the class of
diffusion processes, whose drift and diffusion coefficient are of linear
growth. By omitting the orthogonality in the chaos expansion, we are able to
show that every $p$-integrable functional, for $p \in [1,\infty)$, can be
represented as sum of iterated integrals of the underlying process. Using a
truncated sum of this expansion and (possibly random) neural networks for the
integrands, whose parameters are learned in a machine learning setting, we show
that every financial derivative can be approximated arbitrarily well in the
$L^p$-sense. Since the hedging strategy of the approximating option can be
computed in closed form, we obtain an efficient algorithm that can replicate
any integrable financial derivative with short runtime.
( 2
min )
The tremendous growth in smart devices has uplifted several security threats.
One of the most prominent threats is malicious software also known as malware.
Malware has the capability of corrupting a device and collapsing an entire
network. Therefore, its early detection and mitigation are extremely important
to avoid catastrophic effects. In this work, we came up with a solution for
malware detection using state-of-the-art natural language processing (NLP)
techniques. Our main focus is to provide a lightweight yet effective classifier
for malware detection which can be used for heterogeneous devices, be it a
resource constraint device or a resourceful machine. Our proposed model is
tested on the benchmark data set with an accuracy and log loss score of 99.13
percent and 0.04 respectively.
( 2
min )
Motivated by neural network training in low-bit floating and fixed-point
environments, this work studies the convergence of variants of SGD with
computational error. Considering a general stochastic Lipschitz continuous loss
function, a novel convergence result to a Clarke stationary point is presented
assuming that only an approximation of its stochastic gradient can be computed
as well as error in computing the SGD step itself. Different variants of SGD
are then tested empirically in a variety of low-precision arithmetic
environments, where improved test set accuracy is observed compared to SGD for
two image recognition tasks.
( 2
min )
Gradient descent methods have long been the de facto standard for training
deep neural networks. Millions of training samples are fed into models with
billions of parameters, which are slowly updated over hundreds of epochs.
Recently, it's been shown that large, randomly initialized neural networks
contain subnetworks that perform as well as fully trained models. This insight
offers a promising avenue for training future neural networks by simply pruning
weights from large, random models. However, this problem is combinatorically
hard and classical algorithms are not efficient at finding the best subnetwork.
In this paper, we explore how quantum algorithms could be formulated and
applied to this neuron selection problem. We introduce several methods for
local quantum neuron selection that reduce the entanglement complexity that
large scale neuron selection would require, making this problem more tractable
for current quantum hardware.
( 2
min )
Text-based game environments are challenging because agents must deal with
long sequences of text, execute compositional actions using text and learn from
sparse rewards. We address these challenges by proposing Long-Context Language
Decision Transformers (LLDTs), a framework that is based on long transformer
language models and decision transformers (DTs). LLDTs extend DTs with 3
components: (1) exponential tilt to guide the agent towards high obtainable
goals, (2) novel goal conditioning methods yielding significantly better
results than the traditional return-to-go (sum of all future rewards), and (3)
a model of future observations. Our ablation results show that predicting
future observations improves agent performance. To the best of our knowledge,
LLDTs are the first to address offline RL with DTs on these challenging games.
Our experiments show that LLDTs achieve the highest scores among many different
types of agents on some of the most challenging Jericho games, such as
Enchanter.
( 2
min )
Graph Neural Networks (GNNs) have achieved much success on graph-structured
data. In light of this, there have been increasing interests in studying their
expressive power. One line of work studies the capability of GNNs to
approximate permutation-invariant functions on graphs, and another focuses on
the their power as tests for graph isomorphism. Our work connects these two
perspectives and proves their equivalence. We further develop a framework of
the expressive power of GNNs that incorporates both of these viewpoints using
the language of sigma-algebra, through which we compare the expressive power of
different types of GNNs together with other graph isomorphism tests. In
particular, we prove that the second-order Invariant Graph Network fails to
distinguish non-isomorphic regular graphs with the same degree. Then, we extend
it to a new architecture, Ring-GNN, which succeeds in distinguishing these
graphs and achieves good performances on real-world datasets.
( 2
min )
Recently, \cite{montasser2019vc} showed that finite VC dimension is not
sufficient for \textit{proper} adversarially robust PAC learning. In light of
this hardness result, there is a growing effort to study what type of
relaxations to the adversarially robust PAC learning setup can enable proper
learnability. In this work, we initiate the study of proper learning under
relaxations of the worst-case robust loss. We give a family of robust loss
relaxations under which VC classes are properly PAC learning with sample
complexity close to what one would require in the standard PAC learning setup.
On the other hand, we show that for an existing and natural relaxation of the
worst-case robust loss, finite VC dimension is not sufficient for proper
learning. Lastly, we give new generalization guarantees for the adversarially
robust empirical risk minimizer.
( 2
min )
We prove a convergence theorem for U-statistics of degree two, where the data
dimension $d$ is allowed to scale with sample size $n$. We find that the
limiting distribution of a U-statistic undergoes a phase transition from the
non-degenerate Gaussian limit to the degenerate limit, regardless of its
degeneracy and depending only on a moment ratio. A surprising consequence is
that a non-degenerate U-statistic in high dimensions can have a non-Gaussian
limit with a larger variance and asymmetric distribution. Our bounds are valid
for any finite $n$ and $d$, independent of individual eigenvalues of the
underlying function, and dimension-independent under a mild assumption. As an
application, we apply our theory to two popular kernel-based distribution
tests, MMD and KSD, whose high-dimensional performance has been challenging to
study. In a simple empirical setting, our results correctly predict how the
test power at a fixed threshold scales with $d$ and the bandwidth.
( 2
min )
Deep neural networks (DNN) have shown great capacity of modeling a dynamical
system; nevertheless, they usually do not obey physics constraints such as
conservation laws. This paper proposes a new learning framework named ConCerNet
to improve the trustworthiness of the DNN based dynamics modeling to endow the
invariant properties. ConCerNet consists of two steps: (i) a contrastive
learning method to automatically capture the system invariants (i.e.
conservation properties) along the trajectory observations; (ii) a neural
projection layer to guarantee that the learned dynamics models preserve the
learned invariants. We theoretically prove the functional relationship between
the learned latent representation and the unknown system invariant function.
Experiments show that our method consistently outperforms the baseline neural
networks in both coordinate error and conservation metrics by a large margin.
With neural network based parameterization and no dependence on prior
knowledge, our method can be extended to complex and large-scale dynamics by
leveraging an autoencoder.
( 2
min )
In this work, we consider the stochastic optimal control problem in
continuous time and a policy gradient method to solve it. In particular, we
study the gradient flow for the control, viewed as a continuous time limit of
the policy gradient. We prove the global convergence of the gradient flow and
establish a convergence rate under some regularity assumptions. The main
novelty in the analysis is the notion of local optimal control function, which
is introduced to compare the local optimality of the iterate.
( 2
min )
Human-robot interaction (HRI) research is progressively addressing
multi-party scenarios, where a robot interacts with more than one human user at
the same time. Conversely, research is still at an early stage for human-robot
collaboration (HRC). The use of machine learning techniques to handle such type
of collaboration requires data that are less feasible to produce than in a
typical HRC setup. This work outlines concepts of design of concurrent tasks
for non-dyadic HRC applications. Based upon these concepts, this study also
proposes an alternative way of gathering data regarding multiuser activity, by
collecting data related to single subjects and merging them in post-processing,
to reduce the effort involved in producing recordings of pair settings. To
validate this statement, 3D skeleton poses of activity of single subjects were
collected and merged in pairs. After this, the datapoints were used to
separately train a long short-term memory (LSTM) network and a variational
autoencoder (VAE) composed of spatio-temporal graph convolutional networks
(STGCN) to recognise the joint activities of the pairs of people. The results
showed that it is possible to make use of data collected in this way for pair
HRC settings and get similar performances compared to using data regarding
groups of users recorded under the same settings, relieving from the technical
difficulties involved in producing these data.
( 2
min )
In a recent paper, Ling et al. investigated the over-parametrized Deep
Equilibrium Model (DEQ) with ReLU activation and proved that the gradient
descent converges to a globally optimal solution at a linear convergence rate
for the quadratic loss function. In this paper, we show that this fact still
holds for DEQs with any general activation which has bounded first and second
derivatives. Since the new activation function is generally non-linear, a
general population Gram matrix is designed, and a new form of dual activation
with Hermite polynomial expansion is developed.
( 2
min )
Non-intrusive load monitoring (NILM) aims to decompose aggregated electrical
usage signal into appliance-specific power consumption and it amounts to a
classical example of blind source separation tasks. Leveraging recent progress
on deep learning techniques, we design a new neural NILM model Multi-State Dual
CNN (MSDC). Different from previous models, MSDC explicitly extracts
information about the appliance's multiple states and state transitions, which
in turn regulates the prediction of signals for appliances. More specifically,
we employ a dual-CNN architecture: one CNN for outputting state distributions
and the other for predicting the power of each state. A new technique is
invented that utilizes conditional random fields (CRF) to capture state
transitions. Experiments on two real-world datasets REDD and UK-DALE
demonstrate that our model significantly outperform state-of-the-art models
while having good generalization capacity, achieving 6%-10% MAE gain and
33%-51% SAE gain to unseen appliances.
( 2
min )
With the increased usage of artificial intelligence (AI), it is imperative to
understand how these models work internally. These needs have led to the
development of a new field called eXplainable artificial intelligence (XAI).
This field consists of on a set of techniques that allows us to theoretically
determine the cause of the AI decisions. One unsolved question about XAI is how
to measure the quality of explanations. In this study, we propose a new method
to generate datasets with ground truth (GT). These datasets allow us to measure
how faithful is a method without ad hoc solutions. We conducted a set of
experiments that compared our GT with real model explanations and obtained
excellent results confirming that our proposed method is correct.
( 2
min )
Engineering more secure software has become a critical challenge in the cyber
world. It is very important to develop methodologies, techniques, and tools for
developing secure software. To develop secure software, software developers
need to think like an attacker through mining software repositories. These aim
to analyze and understand the data repositories related to software
development. The main goal is to use these software repositories to support the
decision-making process of software development. There are different
vulnerability databases like Common Weakness Enumeration (CWE), Common
Vulnerabilities and Exposures database (CVE), and CAPEC. We utilized a database
called MITRE. MITRE ATT&CK tactics and techniques have been used in various
ways and methods, but tools for utilizing these tactics and techniques in the
early stages of the software development life cycle (SDLC) are lacking. In this
paper, we use machine learning algorithms to map requirements to the MITRE
ATT&CK database and determine the accuracy of each mapping depending on the
data split.
( 2
min )
Studies have shown that large pretrained language models exhibit biases
against social groups based on race, gender etc, which they inherit from the
datasets they are trained on. Various researchers have proposed mathematical
tools for quantifying and identifying these biases. There have been methods
proposed to mitigate such biases. In this paper, we present a comprehensive
quantitative evaluation of different kinds of biases such as race, gender,
ethnicity, age etc. exhibited by popular pretrained language models such as
BERT, GPT-2 etc. and also present a toolkit that provides plug-and-play
interfaces to connect mathematical tools to identify biases with large
pretrained language models such as BERT, GPT-2 etc. and also present users with
the opportunity to test custom models against these metrics. The toolkit also
allows users to debias existing and custom models using the debiasing
techniques proposed so far. The toolkit is available at
https://github.com/HrishikeshVish/Fairpy.
( 2
min )
Recent advances in instruction-following large language models (LLMs) have
led to dramatic improvements in a range of NLP tasks. Unfortunately, we find
that the same improved capabilities amplify the dual-use risks for malicious
purposes of these models. Dual-use is difficult to prevent as
instruction-following capabilities now enable standard attacks from computer
security. The capabilities of these instruction-following LLMs provide strong
economic incentives for dual-use by malicious actors. In particular, we show
that instruction-following LLMs can produce targeted malicious content,
including hate speech and scams, bypassing in-the-wild defenses implemented by
LLM API vendors. Our analysis shows that this content can be generated
economically and at cost likely lower than with human effort alone. Together,
our findings suggest that LLMs will increasingly attract more sophisticated
adversaries and attacks, and addressing these attacks may require new
approaches to mitigations.
( 2
min )
Modern NLP systems exhibit a range of biases, which a growing literature on
model debiasing attempts to correct. However current progress is hampered by a
plurality of definitions of bias, means of quantification, and oftentimes vague
relation between debiasing algorithms and theoretical measures of bias. This
paper seeks to clarify the current situation and plot a course for meaningful
progress in fair learning, with two key contributions: (1) making clear
inter-relations among the current gamut of methods, and their relation to
fairness theory; and (2) addressing the practical problem of model selection,
which involves a trade-off between fairness and accuracy and has led to
systemic issues in fairness research. Putting them together, we make several
recommendations to help shape future work.
( 2
min )
In recent days, the number of technology enthusiasts is increasing day by day
with the prevalence of technological products and easy access to the internet.
Similarly, the amount of people working behind this rapid development is rising
tremendously. Computer programmers consist of a large portion of those
tech-savvy people. Codeforces, an online programming and contest hosting
platform used by many competitive programmers worldwide. It is regarded as one
of the most standardized platforms for practicing programming problems and
participate in programming contests. In this research, we propose a framework
that predicts the performance of any particular contestant in the upcoming
competitions as well as predicts the rating after that contest based on their
practice and the performance of their previous contests.
( 2
min )
We present a novel momentum-based first order optimization method (AGNES)
which provably achieves acceleration for convex minimization, even if the
stochastic noise in the gradient estimates is many orders of magnitude larger
than the gradient itself. Here we model the noise as having a variance which is
proportional to the magnitude of the underlying gradient. We argue, based upon
empirical evidence, that this is appropriate for mini-batch gradients in
overparameterized deep learning. Furthermore, we demonstrate that the method
achieves competitive performance in the training of CNNs on MNIST and CIFAR-10.
( 2
min )
We consider the sequential decision-making problem where the mean outcome is
a non-linear function of the chosen action. Compared with the linear model, two
curious phenomena arise in non-linear models: first, in addition to the
"learning phase" with a standard parametric rate for estimation or regret,
there is an "burn-in period" with a fixed cost determined by the non-linear
function; second, achieving the smallest burn-in cost requires new exploration
algorithms. For a special family of non-linear functions named ridge functions
in the literature, we derive upper and lower bounds on the optimal burn-in
cost, and in addition, on the entire learning trajectory during the burn-in
period via differential equations. In particular, a two-stage algorithm that
first finds a good initial action and then treats the problem as locally linear
is statistically optimal. In contrast, several classical algorithms, such as
UCB and algorithms relying on regression oracles, are provably suboptimal.
( 2
min )
In this paper, we extend the Wiener-Ito chaos decomposition to the class of
diffusion processes, whose drift and diffusion coefficient are of linear
growth. By omitting the orthogonality in the chaos expansion, we are able to
show that every $p$-integrable functional, for $p \in [1,\infty)$, can be
represented as sum of iterated integrals of the underlying process. Using a
truncated sum of this expansion and (possibly random) neural networks for the
integrands, whose parameters are learned in a machine learning setting, we show
that every financial derivative can be approximated arbitrarily well in the
$L^p$-sense. Since the hedging strategy of the approximating option can be
computed in closed form, we obtain an efficient algorithm that can replicate
any integrable financial derivative with short runtime.
( 2
min )
We establish a dataset of over $1.6\times10^4$ experimental images of
Bose--Einstein condensates containing solitonic excitations to enable machine
learning (ML) for many-body physics research. About $33~\%$ of this dataset has
manually assigned and carefully curated labels. The remainder is automatically
labeled using SolDet -- an implementation of a physics-informed ML data
analysis framework -- consisting of a convolutional-neural-network-based
classifier and OD as well as a statistically motivated physics-informed
classifier and a quality metric. This technical note constitutes the definitive
reference of the dataset, providing an opportunity for the data science
community to develop more sophisticated analysis tools, to further understand
nonlinear many-body physics, and even advance cold atom experiments.
( 2
min )
We provide a first finite-particle convergence rate for Stein variational
gradient descent (SVGD). Specifically, whenever the target distribution is
sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate
step size sequence drives the kernel Stein discrepancy to zero at an order
1/sqrt(log log n) rate. We suspect that the dependence on n can be improved,
and we hope that our explicit, non-asymptotic proof strategy will serve as a
template for future refinements.
( 2
min )
We consider the problem of learning multioutput function classes in batch and
online settings. In both settings, we show that a multioutput function class is
learnable if and only if each single-output restriction of the function class
is learnable. This provides a complete characterization of the learnability of
multilabel classification and multioutput regression in both batch and online
settings. As an extension, we also consider multilabel learnability in the
bandit feedback setting and show a similar characterization as in the
full-feedback setting.
( 2
min )
Despite all the benefits of automated hyperparameter optimization (HPO), most
modern HPO algorithms are black-boxes themselves. This makes it difficult to
understand the decision process which leads to the selected configuration,
reduces trust in HPO, and thus hinders its broad adoption. Here, we study the
combination of HPO with interpretable machine learning (IML) methods such as
partial dependence plots. These techniques are more and more used to explain
the marginal effect of hyperparameters on the black-box cost function or to
quantify the importance of hyperparameters. However, if such methods are
naively applied to the experimental data of the HPO process in a post-hoc
manner, the underlying sampling bias of the optimizer can distort
interpretations. We propose a modified HPO method which efficiently balances
the search for the global optimum w.r.t. predictive performance \emph{and} the
reliable estimation of IML explanations of an underlying black-box function by
coupling Bayesian optimization and Bayesian Algorithm Execution. On benchmark
cases of both synthetic objectives and HPO of a neural network, we demonstrate
that our method returns more reliable explanations of the underlying black-box
without a loss of optimization performance.
( 2
min )
Graph Neural Networks (GNNs) have achieved much success on graph-structured
data. In light of this, there have been increasing interests in studying their
expressive power. One line of work studies the capability of GNNs to
approximate permutation-invariant functions on graphs, and another focuses on
the their power as tests for graph isomorphism. Our work connects these two
perspectives and proves their equivalence. We further develop a framework of
the expressive power of GNNs that incorporates both of these viewpoints using
the language of sigma-algebra, through which we compare the expressive power of
different types of GNNs together with other graph isomorphism tests. In
particular, we prove that the second-order Invariant Graph Network fails to
distinguish non-isomorphic regular graphs with the same degree. Then, we extend
it to a new architecture, Ring-GNN, which succeeds in distinguishing these
graphs and achieves good performances on real-world datasets.
( 2
min )
We present a continuous-time probabilistic approach for estimating the chirp
signal and its instantaneous frequency function when the true forms of these
functions are not accessible. Our model represents these functions by
non-linearly cascaded Gaussian processes represented as non-linear stochastic
differential equations. The posterior distribution of the functions is then
estimated with stochastic filters and smoothers. We compute a (posterior)
Cram\'er--Rao lower bound for the Gaussian process model, and derive a
theoretical upper bound for the estimation error in the mean squared sense. The
experiments show that the proposed method outperforms a number of
state-of-the-art methods on a synthetic data. We also show that the method
works out-of-the-box for two real-world datasets.
( 2
min )
In addition to the weights of synaptic shared connections, PNN includes
weights of synaptic effective ranges [14-24]. PNN considers synaptic strength
balance in dynamic of phagocytosing of synapses and static of constant sum of
synapses length [14], and includes the lead behavior of the school of fish.
Synapse formation will inhibit dendrites generation to a certain extent in
experiments and PNN simulations [15]. The memory persistence gradient of
retrograde circuit similar to the Enforcing Resilience in a Spring Boot. The
relatively good and inferior gradient information stored in memory engram cells
in synapse formation of retrograde circuit like the folds of the brain [16].
The controversy was claimed if human hippocampal neurogenesis persists
throughout aging, PNN considered it may have a new and longer circuit in late
iteration [17,18]. Closing the critical period will cause neurological disorder
in experiments and PNN simulations [19]. Considering both negative and positive
memories persistence help activate synapse length changes with iterations
better than only considering positive memory [20]. Astrocytic phagocytosis will
avoid the local accumulation of synapses by simulation, Lack of astrocytic
phagocytosis causes excitatory synapses and functionally impaired synapses
accumulate in experiments and lead to destruction of cognition, but local
longer synapses and worse results in PNN simulations [21]. It gives
relationship of intelligence and cortical thickness, individual differences in
brain [22]. The PNN also considered the memory engram cells that strengthened
Synaptic strength [23]. The effects of PNN's memory structure and tPBM may be
the same for powerful penetrability of signals [24]. Memory persistence also
inhibit local synaptic accumulation. By PNN, it may introduce the relatively
good and inferior solution in PSO. The simple PNN only has the synaptic
phagocytosis.
( 3
min )
Recently, \cite{montasser2019vc} showed that finite VC dimension is not
sufficient for \textit{proper} adversarially robust PAC learning. In light of
this hardness result, there is a growing effort to study what type of
relaxations to the adversarially robust PAC learning setup can enable proper
learnability. In this work, we initiate the study of proper learning under
relaxations of the worst-case robust loss. We give a family of robust loss
relaxations under which VC classes are properly PAC learning with sample
complexity close to what one would require in the standard PAC learning setup.
On the other hand, we show that for an existing and natural relaxation of the
worst-case robust loss, finite VC dimension is not sufficient for proper
learning. Lastly, we give new generalization guarantees for the adversarially
robust empirical risk minimizer.
( 2
min )
A formal write-up of the simple proof (1995) of the existence of calibrated
forecasts by the minimax theorem, which moreover shows that $N^3$ periods
suffice to guarantee a calibration error of at most $1/N$.
( 2
min )
The limit of infinite width allows for substantial simplifications in the
analytical study of over-parameterised neural networks. With a suitable random
initialisation, an extremely large network exhibits an approximately Gaussian
behaviour. In the present work, we establish a similar result for a simple
stochastic architecture whose parameters are random variables, holding both
before and during training. The explicit evaluation of the output distribution
allows for a PAC-Bayesian training procedure that directly optimises the
generalisation bound. For a large but finite-width network, we show empirically
on MNIST that this training approach can outperform standard PAC-Bayesian
methods.
( 2
min )
Recently there is a rising interest in the research of mean field
optimization, in particular because of its role in analyzing the training of
neural networks. In this paper by adding the Fisher Information as the
regularizer, we relate the regularized mean field optimization problem to a
so-called mean field Schrodinger dynamics. We develop an energy-dissipation
method to show that the marginal distributions of the mean field Schrodinger
dynamics converge exponentially quickly towards the unique minimizer of the
regularized optimization problem. Remarkably, the mean field Schrodinger
dynamics is proved to be a gradient flow on the probability measure space with
respect to the relative entropy. Finally we propose a Monte Carlo method to
sample the marginal distributions of the mean field Schrodinger dynamics.
( 2
min )
We consider the task of representing signals supported on graph bundles,
which are generalizations of product graphs that allow for "twists" in the
product structure. Leveraging the localized product structure of a graph
bundle, we demonstrate how a suitable partition of unity over the base graph
can be used to lift the signal on the graph into a space where a product
factorization can be readily applied. Motivated by the locality of this
procedure, we demonstrate that bases for the signal spaces of the components of
the graph bundle can be lifted in the same way, yielding a basis for the signal
space of the total graph. We demonstrate this construction on synthetic graphs,
as well as with an analysis of the energy landscape of conformational manifolds
in stereochemistry.
( 2
min )
This manuscript investigates the one-pass stochastic gradient descent (SGD)
dynamics of a two-layer neural network trained on Gaussian data and labels
generated by a similar, though not necessarily identical, target function. We
rigorously analyse the limiting dynamics via a deterministic and
low-dimensional description in terms of the sufficient statistics for the
population risk. Our unifying analysis bridges different regimes of interest,
such as the classical gradient-flow regime of vanishing learning rate, the
high-dimensional regime of large input dimension, and the overparameterised
"mean-field" regime of large network width, covering as well the intermediate
regimes where the limiting dynamics is determined by the interplay between
these behaviours. In particular, in the high-dimensional limit, the
infinite-width dynamics is found to remain close to a low-dimensional subspace
spanned by the target principal directions. Our results therefore provide a
unifying picture of the limiting SGD dynamics with synthetic data.
( 2
min )
We present a novel momentum-based first order optimization method (AGNES)
which provably achieves acceleration for convex minimization, even if the
stochastic noise in the gradient estimates is many orders of magnitude larger
than the gradient itself. Here we model the noise as having a variance which is
proportional to the magnitude of the underlying gradient. We argue, based upon
empirical evidence, that this is appropriate for mini-batch gradients in
overparameterized deep learning. Furthermore, we demonstrate that the method
achieves competitive performance in the training of CNNs on MNIST and CIFAR-10.
( 2
min )
The Distributional Random Forest (DRF) is a recently introduced Random Forest
algorithm to estimate multivariate conditional distributions. Due to its
general estimation procedure, it can be employed to estimate a wide range of
targets such as conditional average treatment effects, conditional quantiles,
and conditional correlations. However, only results about the consistency and
convergence rate of the DRF prediction are available so far. We characterize
the asymptotic distribution of DRF and develop a bootstrap approximation of it.
This allows us to derive inferential tools for quantifying standard errors and
the construction of confidence regions that have asymptotic coverage
guarantees. In simulation studies, we empirically validate the developed theory
for inference of low-dimensional targets and for testing distributional
differences between two populations.
( 2
min )
We prove a convergence theorem for U-statistics of degree two, where the data
dimension $d$ is allowed to scale with sample size $n$. We find that the
limiting distribution of a U-statistic undergoes a phase transition from the
non-degenerate Gaussian limit to the degenerate limit, regardless of its
degeneracy and depending only on a moment ratio. A surprising consequence is
that a non-degenerate U-statistic in high dimensions can have a non-Gaussian
limit with a larger variance and asymmetric distribution. Our bounds are valid
for any finite $n$ and $d$, independent of individual eigenvalues of the
underlying function, and dimension-independent under a mild assumption. As an
application, we apply our theory to two popular kernel-based distribution
tests, MMD and KSD, whose high-dimensional performance has been challenging to
study. In a simple empirical setting, our results correctly predict how the
test power at a fixed threshold scales with $d$ and the bandwidth.
( 2
min )
Electronic health records (EHR) often contain sensitive medical information
about individual patients, posing significant limitations to sharing or
releasing EHR data for downstream learning and inferential tasks. We use
normalizing flows (NF), a family of deep generative models, to estimate the
probability density of a dataset with differential privacy (DP) guarantees,
from which privacy-preserving synthetic data are generated. We apply the
technique to an EHR dataset containing patients with pulmonary hypertension. We
assess the learning and inferential utility of the synthetic data by comparing
the accuracy in the prediction of the hypertension status and variational
posterior distribution of the parameters of a physics-based model. In addition,
we use a simulated dataset from a nonlinear model to compare the results from
variational inference (VI) based on privacy-preserving synthetic data, and
privacy-preserving VI obtained from directly privatizing NFs for VI with DP
guarantees given the original non-private dataset. The results suggest that
synthetic data generated through differentially private density estimation with
NF can yield good utility at a reasonable privacy cost. We also show that VI
obtained from differentially private NF based on the free energy bound loss may
produce variational approximations with significantly altered correlation
structure, and loss formulations based on alternative dissimilarity metrics
between two distributions might provide improved results.
( 2
min )
In a recent paper, Ling et al. investigated the over-parametrized Deep
Equilibrium Model (DEQ) with ReLU activation and proved that the gradient
descent converges to a globally optimal solution at a linear convergence rate
for the quadratic loss function. In this paper, we show that this fact still
holds for DEQs with any general activation which has bounded first and second
derivatives. Since the new activation function is generally non-linear, a
general population Gram matrix is designed, and a new form of dual activation
with Hermite polynomial expansion is developed.
( 2
min )
We propose a new bound for generalization of neural networks using Koopman
operators. Unlike most of the existing works, we focus on the role of the final
nonlinear transformation of the networks. Our bound is described by the
reciprocal of the determinant of the weight matrices and is tighter than
existing norm-based bounds when the weight matrices do not have small singular
values. According to existing theories about the low-rankness of the weight
matrices, it may be counter-intuitive that we focus on the case where singular
values of weight matrices are not small. However, motivated by the final
nonlinear transformation, we can see that our result sheds light on a new
perspective regarding a noise filtering property of neural networks. Since our
bound comes from Koopman operators, this work also provides a connection
between operator-theoretic analysis and generalization of neural networks.
Numerical results support the validity of our theoretical results.
( 2
min )
Here is a podcast episode with Noam Brown from Meta AI where we discuss his work on achieving human-level performance on poker and Diplomacy, as well as the power of spending compute at inference time!
submitted by /u/thejashGI
[link] [comments]
( 42
min )
I'm glad to share with you our Open Access survey paper about image super-resolution:
https://ieeexplore.ieee.org/abstract/document/10041995
The goal of this work is to give an overview of the abundance of publications in image super-resolution, give an introduction for new researchers, and open thriving discussions as well as point to potential future directions to advance the field :)
submitted by /u/Maleficent_Stay_7737
[link] [comments]
( 43
min )
Here is a podcast episode with Noam Brown from Meta AI where we discuss his work on achieving human-level performance on poker and Diplomacy, as well as the power of spending compute at inference time!
submitted by /u/thejashGI
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/PuppetHere
[link] [comments]
( 40
min )
submitted by /u/ssigea
[link] [comments]
( 43
min )
submitted by /u/Ranwell13
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/spacesluts
[link] [comments]
( 40
min )
submitted by /u/Dalembert
[link] [comments]
( 43
min )
submitted by /u/JimZerChapirov
[link] [comments]
( 41
min )
submitted by /u/Calatravo
[link] [comments]
( 40
min )
submitted by /u/Impressive_Hat9961
[link] [comments]
( 40
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). It indexes the documents stored in a wide range of repositories and finds the most relevant document based on the keywords or natural language questions the user has searched for. In some scenarios, you need the search results to be filtered based on […]
( 12
min )
We’re excited to announce that Amazon Personalize now lets you measure how your personalized recommendations can help you achieve your business goals. After specifying the metrics that you want to track, you can identify which campaigns and recommenders are most impactful and understand the impact of recommendations on your business metrics. All customers want to […]
( 10
min )
Love and creativity are in the air this Valentine’s Day In the NVIDIA Studio, as 3D artist Molly Brady presents a parody scene inspired by the iconic The Birth of Venus (Redux) painting by Sando Botticelli.
( 7
min )
submitted by /u/Piano-Nerd
[link] [comments]
( 40
min )
submitted by /u/trcytony
[link] [comments]
( 40
min )
submitted by /u/No-Factor2579
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/Chisom1998_
[link] [comments]
( 40
min )
submitted by /u/ChaosMindsDev
[link] [comments]
( 40
min )
submitted by /u/TheInsaneApp
[link] [comments]
( 40
min )
submitted by /u/tipani86
[link] [comments]
( 40
min )
submitted by /u/henshinger
[link] [comments]
( 41
min )
submitted by /u/Phishstixxx
[link] [comments]
( 40
min )
submitted by /u/r4pturesan
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/chronck
[link] [comments]
( 40
min )
submitted by /u/Alarming-Recipe2857
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/tysam_and_co
[link] [comments]
( 47
min )
How a single SYCL codebase makes it possible to run multi-devices such as Intel GPUs, AMD GPUs, and NVIDIA GPUs Posted on behalf of Arti Gupta, Intel oneAPI Program Director The ever-growing scale and speed of High-Performance Computing (HPC) systems unleash many new opportunities for researchers and data scientists. Today, the first exascale-capable HPC systems,… Read More »Advancing HPC and AI through oneAPI Heterogeneous Programming in Academia and Research
The post Advancing HPC and AI through oneAPI Heterogeneous Programming in Academia and Research appeared first on Data Science Central.
( 20
min )
The world is going digital at a very fast speed. From retail shops to the cab industry to banking, all are changing and so is the healthcare industry. We can see a huge difference in the industry in terms of technology compared to ten years back. But there is a long way to go for… Read More »Top Healthcare App Development Trends That Will Dominate 2023
The post Top Healthcare App Development Trends That Will Dominate 2023 appeared first on Data Science Central.
( 22
min )
There’s no denying that we live in an app-driven world, and that’s especially true for modern businesses. Organizations use apps for almost everything. While this allows for faster communication, it can also lead to application fragmentation. App fragmentation is when an organization uses multiple applications to perform similar tasks. This creates an inefficient and disjointed… Read More »App Fragmentation & How To Avoid Siloed Communication: 3 Right Technologies for The Job
The post App Fragmentation & How To Avoid Siloed Communication: 3 Right Technologies for The Job appeared first on Data Science Central.
( 22
min )
So I just uploaded a devlog out about my bullet-dodging AI game. I discuss how I trained a Reinforcement Learning agent to learn to dodge bullets using Unity's ML Agents package! The goal of the next devlog is to extend this to a 2 player setting, where a human player competes against a trained AI player to dodge/shoot bullets! I will probably be doing some MARL with self-play to achieve this, but this video is a single-agent setting.
I'm a baby Youtuber, so I appreciate yall for checking it out!
https://youtu.be/l9geEcn-A6Q
submitted by /u/AvvYaa
[link] [comments]
( 41
min )
This post is co-written by Zdenko Estok, Cloud Architect at Accenture and Sakar Selimcan, DeepRacer SME at Accenture. With the increasing use of artificial intelligence (AI) and machine learning (ML) for a vast majority of industries (ranging from healthcare to insurance, from manufacturing to marketing), the primary focus shifts to efficiency when building and training […]
( 8
min )
The method enables a model to determine its confidence in a prediction, while using no additional data and far fewer computing resources than other methods.
( 9
min )
submitted by /u/radi-cho
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 42
min )
submitted by /u/radi-cho
[link] [comments]
( 42
min )
submitted by /u/Thebombdiggityy
[link] [comments]
( 42
min )
submitted by /u/Wiskkey
[link] [comments]
( 43
min )
submitted by /u/t0ns0fph0t0ns
[link] [comments]
( 44
min )
submitted by /u/helliun
[link] [comments]
( 45
min )
Using available off-the-shelf AI services, I ended up making this video. I walk through the process and discuss some implications.
Here is the process that I followed
Asked ChatGPT to create a script
Asked a text-to-speech generative AI to convert the script into an audio
Asked MidJourney to create an Avatar of a narrator
Ask audio-to-video generative AI to generate video from the avatar and audio.
https://ithinkbot.com/make-end-to-end-video-using-generative-ai-totally-free-try-it-out-dadee18302de
submitted by /u/Opitmus_Prime
[link] [comments]
( 41
min )
submitted by /u/ProglabHelper
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/Thebombdiggityy
[link] [comments]
( 40
min )
submitted by /u/XyBr_ez
[link] [comments]
( 40
min )
submitted by /u/Peter3tv33
[link] [comments]
( 41
min )
submitted by /u/joeyjojo6161
[link] [comments]
( 40
min )
Hi guys,
I have made a video on YouTube here where I explain how we can measure the fairness of a machine learning model by using the disparate impact score.
I hope it may be of use to some of you out there. As always, feedback is more than welcomed! :)
submitted by /u/Personal-Trainer-541
[link] [comments]
( 41
min )
submitted by /u/YungMixtape2004
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/Number_5_alive
[link] [comments]
( 40
min )
Hi, I have just come across AI course provided by OpenCV. It has a lot about computer vision stuff. But it costs $1599, anyone is taking it? any comment? Should I bet on this for a career change?
P.S. I have some basic programming knowledge and engineering background.
Here is the link to their course page.
https://opencv.org/courses/
submitted by /u/sumofjack
[link] [comments]
( 41
min )
submitted by /u/ssigea
[link] [comments]
( 46
min )
submitted by /u/SpatialComputing
[link] [comments]
( 41
min )
submitted by /u/karrnawhore
[link] [comments]
( 41
min )
Hi guys,
I have made a video on YouTube here where I explain how we can measure the fairness of a machine learning model by using the disparate impact score.
I hope it may be of use to some of you out there. As always, feedback is more than welcomed! :)
submitted by /u/Personal-Trainer-541
[link] [comments]
( 41
min )
submitted by /u/Lakshmireddys
[link] [comments]
( 40
min )
submitted by /u/karrnawhore
[link] [comments]
( 40
min )
If you want any more proof about how much AI has integrated itself into our daily lives, go no further than the map on your smart phone. Whether you use Google Maps or Apple Maps or Waze (also owned by Google), these AI-infused apps are amazing at getting you from Point A to Point B… Read More »AI Effectiveness Starts by Understanding User Intent
The post AI Effectiveness Starts by Understanding User Intent appeared first on Data Science Central.
( 22
min )
Incident management for cloud services is a complex process involving several
steps and has a huge impact on both service health and developer productivity.
On-call engineers require significant amount of domain knowledge and manual
effort for root causing and mitigation of production incidents. Recent advances
in artificial intelligence has resulted in state-of-the-art large language
models like GPT-3.x (both GPT-3.0 and GPT-3.5), which have been used to solve a
variety of problems ranging from question answering to text summarization. In
this work, we do the first large-scale study to evaluate the effectiveness of
these models for helping engineers root cause and mitigate production
incidents. We do a rigorous study at Microsoft, on more than 40,000 incidents
and compare several large language models in zero-shot, fine-tuned and
multi-task setting using semantic and lexical metrics. Lastly, our human
evaluation with actual incident owners show the efficacy and future potential
of using artificial intelligence for resolving cloud incidents.
( 2
min )
submitted by /u/karrnawhore
[link] [comments]
( 42
min )
submitted by /u/TheRealBrisky
[link] [comments]
( 42
min )
submitted by /u/norcalnatv
[link] [comments]
( 44
min )
submitted by /u/dvilasuero
[link] [comments]
( 42
min )
submitted by /u/erwinyonata
[link] [comments]
( 42
min )
submitted by /u/_sshin_
[link] [comments]
( 45
min )
submitted by /u/iFighting
[link] [comments]
( 43
min )
submitted by /u/BackgroundResult
[link] [comments]
( 40
min )
submitted by /u/qptbook
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
....Instead, it's introducing a new way for people to access the same information. One which can put a major dent in its market share (it’s almost 85% right now).
And Satya says he's willing to accept a "decrease in margins" of the Search business.
https://www.thestatuscode.co/p/the-ultimate-guide-to-the-ai-war
submitted by /u/pyactee
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 45
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/fiachaire27
[link] [comments]
( 40
min )
submitted by /u/Taiva
[link] [comments]
( 40
min )
submitted by /u/LeafsterVR
[link] [comments]
( 40
min )
submitted by /u/pengzhenghao
[link] [comments]
( 41
min )
submitted by /u/jromero12345678910
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
I really want to play with the repo but I'm stuck at the last step of the instructions (https://github.com/lucidrains/musiclm-pytorch#usage-1). If anyone has tips, please let me know!
Here's the issue I have: https://github.com/lucidrains/musiclm-pytorch/issues/13
submitted by /u/BackgroundPass2082
[link] [comments]
( 42
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/hakJav
[link] [comments]
( 41
min )
submitted by /u/vfra32
[link] [comments]
( 41
min )
submitted by /u/the_ferryman_abides
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/PuppetHere
[link] [comments]
( 40
min )
submitted by /u/RushingRobotics_com
[link] [comments]
( 40
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 40
min )
submitted by /u/benbyford
[link] [comments]
( 40
min )
submitted by /u/Legal-Ad-1650
[link] [comments]
( 41
min )
submitted by /u/SpawnOfCthun
[link] [comments]
( 40
min )
submitted by /u/arnolds112
[link] [comments]
( 40
min )
submitted by /u/howardpinsky
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/riiswa
[link] [comments]
( 40
min )
MIT spinout Verta offers tools to help companies introduce, monitor, and manage machine-learning models safely and at scale.
( 10
min )
This post is co-written with Jonathan Jung, Mike Band, Michael Chi, and Thompson Bliss at the National Football League. A coverage scheme refers to the rules and responsibilities of each football defender tasked with stopping an offensive pass. It is at the core of understanding and analyzing any football defensive strategy. Classifying the coverage scheme […]
( 14
min )
The metaverse, a term popularised by science fiction, refers to a shared virtual space where users can interact with each other in a virtual environment. It’s a convergence of real and virtual worlds, creating a new reality that exists simultaneously with the physical world. With the rapid advancement of technology, particularly in the field of… Read More »Metaverse Development: Building the Future of Virtual Reality
The post Metaverse Development: Building the Future of Virtual Reality appeared first on Data Science Central.
( 20
min )
submitted by /u/ziroxonline
[link] [comments]
( 40
min )
submitted by /u/Imagine-your-success
[link] [comments]
( 40
min )
submitted by /u/red3vil96
[link] [comments]
( 41
min )
The following guide provides an independent review of how well this OpenAI detection software performs and how its capabilities stack up against competitors (for finding A!-generated text and plagiarism) OpenAI Text Classifier: ChatGPT’s Own AI Detection - Review
submitted by /u/thumbsdrivesmecrazy
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 44
min )
submitted by /u/sifeliz
[link] [comments]
( 40
min )
submitted by /u/Ok-Craft-9908
[link] [comments]
( 41
min )
submitted by /u/okanaganjournal
[link] [comments]
( 40
min )
submitted by /u/Number_5_alive
[link] [comments]
( 40
min )
submitted by /u/AdministrativeLet996
[link] [comments]
( 41
min )
submitted by /u/BackgroundResult
[link] [comments]
( 40
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 43
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/MindCluster
[link] [comments]
( 40
min )
submitted by /u/victorsevero
[link] [comments]
( 44
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Signatures is a feature within Amazon Textract that offers the ability to automatically detect signatures on any document. This can reduce the need for human review, custom code, or ML experience. In this post, […]
( 7
min )
Earth’s changing climate poses an increased risk of drought due to global warming. Since 1880, the global temperature has increased 1.01 °C. Since 1993, sea levels have risen 102.5 millimeters. Since 2002, the land ice sheets in Antarctica have been losing mass at a rate of 151.0 billion metric tons per year. In 2022, the […]
( 10
min )
The chatbot’s success on the medical licensing exam shows that the test — and medical education — are flawed, Celi says.
( 8
min )
Would like to hear about what you guys think about this approach?
submitted by /u/ThePerson654321
[link] [comments]
( 43
min )
submitted by /u/nickb
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
Electric automaker XPENG’s flagship G9 SUV and P7 sports sedan are now available for order in Sweden, Denmark, Norway and the Netherlands — an expansion revealed last week at the eCar Expo in Stockholm. The intelligent electric vehicles are built on the high-performance NVIDIA DRIVE Orin centralized compute architecture and deliver AI capabilities that are Read article >
( 5
min )
Designing automotive visualizations can be incredibly time consuming. To make the renders look as realistic as possible, artists need to consider material textures, paints, realistic lighting and reflections, and more. For 3D artist David Baylis, it’s important to include these details and still create high-resolution renders in a short amount of time. That’s why he Read article >
( 6
min )
Venture to the Forgotten Realms this GFN Thursday in Baldur’s Gate 3, streaming on GeForce NOW. Celebrations for the cloud gaming service’s third anniversary continue with a Dying Light 2 reward that’s to die for. It’s the cherry on top of three new titles joining the GeForce NOW library this week. Roll for Initiative Mysterious Read article >
( 5
min )
submitted by /u/Lakshmireddys
[link] [comments]
( 40
min )
submitted by /u/keghn
[link] [comments]
( 40
min )
submitted by /u/joemurray1994
[link] [comments]
( 41
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/Lakshmireddys
[link] [comments]
( 40
min )
submitted by /u/henlo_there_fren
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/joemurray1994
[link] [comments]
( 40
min )
submitted by /u/Fantomas77
[link] [comments]
( 40
min )
submitted by /u/derstarkerwille
[link] [comments]
( 41
min )
submitted by /u/theindianappguy
[link] [comments]
( 41
min )
submitted by /u/nickkgar
[link] [comments]
( 40
min )
Machine learning (ML) has become ubiquitous. Our customers are employing ML in every aspect of their business, including the products and services they build, and for drawing insights about their customers. To build an ML-based application, you have to first build the ML model that serves your business requirement. Building ML models involves preparing the […]
( 15
min )
The first NVIDIA Studio laptops powered by GeForce RTX 40 Series Laptop GPUs are now available, starting with systems from MSI and Razer — with many more to come.
( 8
min )
Critical applications, such as in the medical field, require the rapid
provision of additional information to interpret decisions made by deep
learning methods. In this work, we propose a fast and accurate method to
visualize activations of classification and semantic segmentation networks by
stitching them with a GAN generator utilizing convolutions. We test our
approach on images of animals from the AFHQ wild dataset and real-world digital
pathology scans of stained tissue samples. Our method provides comparable
results to established gradient descent methods on these datasets while running
about two orders of magnitude faster.
( 2
min )
We study online Reinforcement Learning (RL) in non-stationary input-driven
environments, where a time-varying exogenous input process affects the
environment dynamics. Online RL is challenging in such environments due to
catastrophic forgetting (CF). The agent tends to forget prior knowledge as it
trains on new experiences. Prior approaches to mitigate this issue assume task
labels (which are often not available in practice) or use off-policy methods
that can suffer from instability and poor performance.
We present Locally Constrained Policy Optimization (LCPO), an on-policy RL
approach that combats CF by anchoring policy outputs on old experiences while
optimizing the return on current experiences. To perform this anchoring, LCPO
locally constrains policy optimization using samples from experiences that lie
outside of the current input distribution. We evaluate LCPO in two gym and
computer systems environments with a variety of synthetic and real input
traces, and find that it outperforms state-of-the-art on-policy and off-policy
RL methods in the online setting, while achieving results on-par with an
offline agent pre-trained on the whole input trace.
( 2
min )
Bilevel optimization has been developed for many machine learning tasks with
large-scale and high-dimensional data. This paper considers a constrained
bilevel optimization problem, where the lower-level optimization problem is
convex with equality and inequality constraints and the upper-level
optimization problem is non-convex. The overall objective function is
non-convex and non-differentiable. To solve the problem, we develop a
gradient-based approach, called gradient approximation method, which determines
the descent direction by computing several representative gradients of the
objective function inside a neighborhood of the current estimate. We show that
the algorithm asymptotically converges to the set of Clarke stationary points,
and demonstrate the efficacy of the algorithm by the experiments on
hyperparameter optimization and meta-learning.
( 2
min )
Contrary to its original interpretation as a facilitator of knowledge
transfer from one model to another, some recent studies have suggested that
knowledge distillation (KD) is instead a form of regularization. Perhaps the
strongest support of all for this claim is found in its apparent similarities
with label smoothing (LS). This paper investigates the stated equivalence of
these two methods by examining the predictive uncertainties of the models they
train. Experiments on four text classification tasks involving teachers and
students of different capacities show that: (a) In most settings, KD and LS
drive model uncertainty (entropy) in completely opposite directions, and (b) In
KD, the student's predictive uncertainty is a direct function of that of its
teacher, reinforcing the knowledge transfer view.
( 2
min )
This work investigates the intersection of cross modal learning and semi
supervised learning, where we aim to improve the supervised learning
performance of the primary modality by borrowing missing information from an
unlabeled modality. We investigate this problem from a Nadaraya Watson (NW)
kernel regression perspective and show that this formulation implicitly leads
to a kernelized cross attention module. To this end, we propose The Attention
Patch (TAP), a simple neural network plugin that allows data level knowledge
transfer from the unlabeled modality. We provide numerical simulations on three
real world datasets to examine each aspect of TAP and show that a TAP
integration in a neural network can improve generalization performance using
the unlabeled modality.
( 2
min )
There has been much recent progress in forecasting the next observation of a
linear dynamical system (LDS), which is known as the improper learning, as well
as in the estimation of its system matrices, which is known as the proper
learning of LDS. We present an approach to proper learning of LDS, which in
spite of the non-convexity of the problem, guarantees global convergence of
numerical solutions to a least-squares estimator. We present promising
computational results.
( 2
min )
This work investigates the intersection of cross modal learning and semi
supervised learning, where we aim to improve the supervised learning
performance of the primary modality by borrowing missing information from an
unlabeled modality. We investigate this problem from a Nadaraya Watson (NW)
kernel regression perspective and show that this formulation implicitly leads
to a kernelized cross attention module. To this end, we propose The Attention
Patch (TAP), a simple neural network plugin that allows data level knowledge
transfer from the unlabeled modality. We provide numerical simulations on three
real world datasets to examine each aspect of TAP and show that a TAP
integration in a neural network can improve generalization performance using
the unlabeled modality.
( 2
min )
We present a variety of novel information-theoretic generalization bounds for
learning algorithms, from the supersample setting of Steinke & Zakynthinou
(2020)-the setting of the "conditional mutual information" framework. Our
development exploits projecting the loss pair (obtained from a training
instance and a testing instance) down to a single number and correlating loss
values with a Rademacher sequence (and its shifted variants). The presented
bounds include square-root bounds, fast-rate bounds, including those based on
variance and sharpness, and bounds for interpolating algorithms etc. We show
theoretically or empirically that these bounds are tighter than all
information-theoretic bounds known to date on the same supersample setting.
( 2
min )
Message Passing Neural Networks (MPNNs) are instances of Graph Neural
Networks that leverage the graph to send messages over the edges. This
inductive bias leads to a phenomenon known as over-squashing, where a node
feature is insensitive to information contained at distant nodes. Despite
recent methods introduced to mitigate this issue, an understanding of the
causes for over-squashing and of possible solutions are lacking. In this
theoretical work, we prove that: (i) Neural network width can mitigate
over-squashing, but at the cost of making the whole network more sensitive;
(ii) Conversely, depth cannot help mitigate over-squashing: increasing the
number of layers leads to over-squashing being dominated by vanishing
gradients; (iii) The graph topology plays the greatest role, since
over-squashing occurs between nodes at high commute (access) time. Our analysis
provides a unified framework to study different recent methods introduced to
cope with over-squashing and serves as a justification for a class of methods
that fall under `graph rewiring'.
( 2
min )
This work studies the pure-exploration setting for the convex hull
feasibility (CHF) problem where one aims to efficiently and accurately
determine if a given point lies in the convex hull of means of a finite set of
distributions. We give a complete characterization of the sample complexity of
the CHF problem in the one-dimensional setting. We present the first
asymptotically optimal algorithm called Thompson-CHF, whose modular design
consists of a stopping rule and a sampling rule. In addition, we provide an
extension of the algorithm that generalizes several important problems in the
multi-armed bandit literature. Finally, we further investigate the Gaussian
bandit case with unknown variances and address how the Thompson-CHF algorithm
can be adjusted to be asymptotically optimal in this setting.
( 2
min )
The recipe behind the success of deep learning has been the combination of
neural networks and gradient-based optimization. Understanding the behavior of
gradient descent however, and particularly its instability, has lagged behind
its empirical success. To add to the theoretical tools available to study
gradient descent we propose the principal flow (PF), a continuous time flow
that approximates gradient descent dynamics. To our knowledge, the PF is the
only continuous flow that captures the divergent and oscillatory behaviors of
gradient descent, including escaping local minima and saddle points. Through
its dependence on the eigendecomposition of the Hessian the PF sheds light on
the recently observed edge of stability phenomena in deep learning. Using our
new understanding of instability we propose a learning rate adaptation method
which enables us to control the trade-off between training stability and test
set evaluation performance.
( 2
min )
https://www.theverge.com/2023/2/7/23587454/microsoft-bing-edge-chatgpt-ai
submitted by /u/currentscurrents
[link] [comments]
( 44
min )
From Article:
Getty Images new lawsuit claims that Stability AI, the company behind Stable Diffusion's AI image generator, stole 12 million Getty images with their captions, metadata, and copyrights "without permission" to "train its Stable Diffusion algorithm."
The company has asked the court to order Stability AI to remove violating images from its website and pay $150,000 for each.
However, it would be difficult to prove all the violations. Getty submitted over 7,000 images, metadata, and copyright registration, used by Stable Diffusion.
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 49
min )
📢 News 📢
Pythae 0.1.0 is now out and supports distributed training using PyTorch DDP !
Train your favorite Variational Autoencoders (VAEs) faster 🏎️ and on larger datasets, still with a few lines of code 🖥️.
👉github: https://github.com/clementchadebec/benchmark_VAE
👉pypi: https://pypi.org/project/pythae/
https://preview.redd.it/jk4ukkgarpga1.png?width=1335&format=png&auto=webp&s=07c1ab2eaad104879637ad04472935d87baa31e9
submitted by /u/cchad-8
[link] [comments]
( 43
min )
Hey guys, I’m the co-founder of a tech startup focused on providing free AI services. We’re one of the first mobile multipurpose AI apps.
We’ve developed a pretty cool app that offers AI services like image generation, code generation, image captioning, and more for free. We’re sort of like a Swiss Army knife of generative and analytical AI.
We’ve released a new feature called AAIA (Ask AI Anything), which is capable of answering all types of questions, even requests to generate literature, story-lines, jokes, general information, etc.
We’d love to have some people try it out, give us feedback, and keep in touch with us.
https://apps.apple.com/us/app/bright-eye/id1593932475
submitted by /u/BrightEyeuser
[link] [comments]
( 41
min )
submitted by /u/bukowski3000
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/citizentim
[link] [comments]
( 40
min )
submitted by /u/Number_5_alive
[link] [comments]
( 40
min )
submitted by /u/Mogen1000
[link] [comments]
( 42
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/PuppetHere
[link] [comments]
( 40
min )
submitted by /u/pmigdal
[link] [comments]
( 42
min )
submitted by /u/pentin0
[link] [comments]
( 40
min )
submitted by /u/nowadayswow
[link] [comments]
( 40
min )
https://medium.com/seeds-for-the-future/the-next-step-for-generative-ai-830112890d04?sk=1d6b4c96cc6cb0a4690bcf9df0d12bcc
submitted by /u/arnolds112
[link] [comments]
( 40
min )
submitted by /u/Esportage
[link] [comments]
( 39
min )
submitted by /u/CoolkidRR
[link] [comments]
( 41
min )
submitted by /u/qptbook
[link] [comments]
( 40
min )
submitted by /u/quanik_314
[link] [comments]
( 40
min )
submitted by /u/xWh0am1
[link] [comments]
( 41
min )
submitted by /u/Historical-Pen9653
[link] [comments]
( 41
min )
This post is co-written with Stephen Aylward, Matt McCormick, Brianna Major from Kitware and Justin Kirby from the Frederick National Laboratory for Cancer Research (FNLCR). Amazon SageMaker Studio Lab provides no-cost access to a machine learning (ML) development environment to everyone with an email address. Like the fully featured Amazon SageMaker Studio, Studio Lab allows […]
( 8
min )
Amazon SageMaker has announced the support of three new completion criteria for Amazon SageMaker automatic model tuning, providing you with an additional set of levers to control the stopping criteria of the tuning job when finding the best hyperparameter configuration for your model. In this post, we discuss these new completion criteria, when to use them, and […]
( 8
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Announcements Machine Learning Controversy: From No-Code to No-Math One controversial topic in machine learning circles is code versus no-code. Can you be a real data scientist if you don’t code? Of course you can: You may be leveraging platforms and the code is one or two layers below the responsibilities of your job. Maybe you… Read More »DSC Weekly 7 February 2023 – Machine Learning Controversy: From No-Code to No-Math
The post DSC Weekly 7 February 2023 – Machine Learning Controversy: From No-Code to No-Math appeared first on Data Science Central.
( 21
min )
Data labeling and/or data annotation has long been a critical component of many machine learning and AI initiatives. In recent years, the demand for accurate and reliable data labeling has risen dramatically as the process becomes increasingly vital to the success of numerous projects. But what is data labeling exactly? Data Labeling 2023 – how… Read More »The Impact of Data Labeling 2023: Current Trends & Future Demands
The post The Impact of Data Labeling 2023: Current Trends & Future Demands appeared first on Data Science Central.
( 22
min )
Mobile Apps to Develop Your Data Science Skills -Mobile phones are the most preferred medium of accomplishing minute-to-minutest tasks on a daily basis. We don’t need to visit any particular restaurant to take away the food, we can do this by just sitting on our favorite couch at home, thanks to food ordering apps. Not… Read More »Best 9 Mobile Apps to Develop Your Data Science Skills in 2023
The post Best 9 Mobile Apps to Develop Your Data Science Skills in 2023 appeared first on Data Science Central.
( 23
min )
Doctors rarely make diagnoses based on a single factor — they look at a mix of data types, such as a patient’s symptoms, laboratory and radiology reports, and medical history. VinBrain, a Vietnam-based health-tech startup, is ensuring that AI diagnostics can take a similarly holistic view across vital signs, blood tests, medical images and more. Read article >
( 6
min )
submitted by /u/trcytony
[link] [comments]
( 40
min )
submitted by /u/barrese87
[link] [comments]
( 40
min )
submitted by /u/barrese87
[link] [comments]
( 40
min )
submitted by /u/VR_Angel
[link] [comments]
( 41
min )
submitted by /u/VR_Angel
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/Peaking_AI
[link] [comments]
( 40
min )
submitted by /u/shauryadevil
[link] [comments]
( 40
min )
submitted by /u/Zirius_Sadfaces
[link] [comments]
( 40
min )
submitted by /u/magenta_placenta
[link] [comments]
( 40
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 40
min )
submitted by /u/AR_MR_XR
[link] [comments]
( 40
min )
submitted by /u/TheDotnetoffice
[link] [comments]
( 40
min )
submitted by /u/nikesh96
[link] [comments]
( 41
min )
AI Seinfeld Transphobic rant - YouTube
submitted by /u/Status_Signal_4083
[link] [comments]
( 42
min )
submitted by /u/johnGettings
[link] [comments]
( 41
min )
submitted by /u/vadhavaniyafaijan
[link] [comments]
( 41
min )
submitted by /u/DarronFeldstein
[link] [comments]
( 40
min )
submitted by /u/ImplodingCoding
[link] [comments]
( 44
min )
I have made a Stack Overflow post here. I will highly appreciate all your help on this. Thank you!
submitted by /u/Academic-Rent7800
[link] [comments]
( 42
min )
It took me about 46 hours to run this on my 3080 at home. The original files was from the Blu-ray release that was unfortunately pretty poorly done in my opinion. This version really gives it new life I think.
Here's a link to the video result to see for yourself:
https://vimeo.com/796411232
And a link to the model I used!
https://github.com/TencentARC/AnimeSR
submitted by /u/VR_Angel
[link] [comments]
( 43
min )
https://blog.google/technology/ai/bard-google-ai-search-updates/
submitted by /u/EducationalCicada
[link] [comments]
( 50
min )
From the article:
Getty Images has filed a lawsuit in the US against Stability AI, creators of open-source AI art generator Stable Diffusion, escalating its legal battle against the firm.
The stock photography company is accusing Stability AI of “brazen infringement of Getty Images’ intellectual property on a staggering scale.” It claims that Stability AI copied more than 12 million images from its database “without permission ... or compensation ... as part of its efforts to build a competing business,” and that the startup has infringed on both the company’s copyright and trademark protections.
This is different from the UK-based news from weeks ago.
submitted by /u/Wiskkey
[link] [comments]
( 44
min )
I made an image captioning and clustering tools for computer vision and diffusion projects.
You can run almost everything automatically and with a simple CLI command. All contributions are welcome.
https://github.com/cobanov/image-clustering
https://github.com/cobanov/image-captioning
submitted by /u/metover
[link] [comments]
( 42
min )
submitted by /u/ImplodingCoding
[link] [comments]
( 43
min )
submitted by /u/t0ns0fph0t0ns
[link] [comments]
( 44
min )
submitted by /u/imagoons
[link] [comments]
( 42
min )
A new tool brings the benefits of AI programming to a much broader class of problems.
( 8
min )
This blog post is co-written with Bruno Mateus, Jonathan Diedrich and Crispim Tribuna at Talkdesk. Contact centers are using artificial intelligence (AI) and natural language processing (NLP) technologies to build a personalized customer experience and deliver effective self-service support through conversational bots. This is the first of a two-part series dedicated to the integration of […]
( 8
min )
Researchers continue to develop new model architectures for common machine learning (ML) tasks. One such task is image classification, where images are accepted as input and the model attempts to classify the image as a whole with object label outputs. With many models available today that perform this image classification task, an ML practitioner may […]
( 11
min )
“I’ll tell you the problem with the scientific power that you’re using here: it didn’t require any discipline to attain it. You read what others had done and you took the next step. You didn’t earn the knowledge for yourselves, so you don’t take any responsibility for it. You stood on the shoulders of geniuses… Read More »It’s No Big Deal, but ChatGPT Changes Everything – Part III
The post It’s No Big Deal, but ChatGPT Changes Everything – Part III appeared first on Data Science Central.
( 24
min )
Just a few days ago, January 28, we celebrated Data Protection Day, an international event aimed at promoting data privacy and security. In line with the goal of raising awareness about data protection, it would be a good time to discuss data security with Realtime Operating System. This unconventional operating system is widely used, so… Read More »Ensuring Data Security in Realtime Operating System (RTOS) Devices
The post Ensuring Data Security in Realtime Operating System (RTOS) Devices appeared first on Data Science Central.
( 21
min )
A University of Toronto undergrad among an international team of researchers unleashing deep learning in the search for extraterrestrial civilizations.
( 6
min )
submitted by /u/Illustrious_Row_9971
[link] [comments]
( 42
min )
submitted by /u/DenofBlerds
[link] [comments]
( 42
min )
submitted by /u/WarmFormal9881
[link] [comments]
( 42
min )
submitted by /u/jsonathan
[link] [comments]
( 46
min )
Tweet thread: https://twitter.com/WholeMarsBlog/status/1622139178439036928
First impressions: this sucks ass I can only ask about dogs and a few different types of prompts
Does anyone else have experiences to share with this nerfed LaMDA beta google released?
submitted by /u/That_Violinist_18
[link] [comments]
( 44
min )
submitted by /u/Illustrious_Row_9971
[link] [comments]
( 42
min )
https://youtu.be/ktdUeqzzhiA what text to speech does he use? he's been popping up on my yt feed lately and i can see he has different voices in his videos and most of them sound robotic, what do you think it's being used here?
submitted by /u/candidhorse4
[link] [comments]
( 42
min )
submitted by /u/EIDANart
[link] [comments]
( 40
min )
submitted by /u/yikeshardware
[link] [comments]
( 42
min )
submitted by /u/IndependenceFun4627
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 41
min )
submitted by /u/foundersblock
[link] [comments]
( 40
min )
How can we move from an idea to production in AI?
Does the technology readiness levels (TRL) help?
If you want to get some answers please read this article in medium:
https://medium.com/towards-artificial-intelligence/technology-readiness-levels-trl-in-ai-development-c6ed1190fbd6
All the ideas are more than welcome!
submitted by /u/Nice-Tomorrow2926
[link] [comments]
( 40
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 42
min )
submitted by /u/barrese87
[link] [comments]
( 40
min )
Hi all,
For my weekend project I figured I would build an AI driven spiritual successor to Mystery Science Theater 3000... Stop on by and watch the AI characters watch movies and make comments!
Today they are watching "The House on Haunted Hill" and "Plan 9 From Outer Space."
There's still a lot to do but I'm excited to play around with this more and see how it plays out and would love some feedback!
https://twitch.tv/MysteryAItheater
submitted by /u/caseigl
[link] [comments]
( 42
min )
submitted by /u/Mental_Character7367
[link] [comments]
( 54
min )
submitted by /u/BackgroundResult
[link] [comments]
( 41
min )
submitted by /u/shani_786
[link] [comments]
( 41
min )
submitted by /u/insaneintheblain
[link] [comments]
( 41
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/arnolds112
[link] [comments]
( 40
min )
submitted by /u/LincolnOsiris_
[link] [comments]
( 40
min )
submitted by /u/Flaky_Preparation_50
[link] [comments]
( 40
min )
submitted by /u/ai-lover
[link] [comments]
( 41
min )
submitted by /u/visimens-technology
[link] [comments]
( 40
min )
submitted by /u/Tao_Dragon
[link] [comments]
( 41
min )
https://www.udemy.com/course/chatgpt-bot/?couponCode=5-DAYS-FREE
Hey everyone, I recently made a course about ChatGPT as a fun passion project. This is for anyone who wants to learn how to create automated workflows (using Chrome extensions) with ChatGPT. Specifically, you will create a ChatGPT bot that automatically answers your emails. It is beginner friendly and includes getting some good practice with JavaScript. I hope you enjoy it and I'm looking forward to your feedback/questions :)
submitted by /u/neuromodel
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
https://www.youtube.com/watch?v=8TOgN-U0ask&t=1s
After the Lensa AI controversy led many people to question whether AI really is creative or is it just "remixing" other artists' copyrighted work used with permission, it has led many to wonder whether AI trained on copyrighted images should be illegal. This talk makes some interesting comparisons which might just mean the answer is no.
submitted by /u/BearNo21
[link] [comments]
( 41
min )
submitted by /u/liquidocelotYT
[link] [comments]
( 40
min )
submitted by /u/Mental_Character7367
[link] [comments]
( 40
min )
submitted by /u/madskills42001
[link] [comments]
( 41
min )
submitted by /u/oridnary_artist
[link] [comments]
( 40
min )
submitted by /u/keghn
[link] [comments]
( 40
min )
submitted by /u/radi-cho
[link] [comments]
( 44
min )
submitted by /u/adamnemecek
[link] [comments]
( 42
min )
submitted by /u/errorr_unknown
[link] [comments]
( 42
min )
submitted by /u/EmbarrassedHelp
[link] [comments]
( 43
min )
submitted by /u/MysteryInc152
[link] [comments]
( 42
min )
submitted by /u/adt
[link] [comments]
( 42
min )
From the Financial Times: https://www.ft.com/content/583ead66-467c-4bd5-84d0-ed5df7b5bf9c
Unpaywalled: https://archive.is/ciZPV
I guess I'm a little surprised, this feels like Google backing a competitor to 1) their own Google Brain teams, and 2) Deepmind. The cynical take might be that they're trying to lock in Anthropic; the same way Microsoft locked in OpenAI.
submitted by /u/bikeskata
[link] [comments]
( 47
min )
Github: https://github.com/google/vizier
Google AI Blog: https://ai.googleblog.com/2023/02/open-source-vizier-towards-reliable-and.html
Tweet from Zoubin Ghahramani: https://twitter.com/ZoubinGhahrama1/status/1621321675936768000?s=20&t=ZEuz9oSc_GWYxixtXDskqA
submitted by /u/enderlayer
[link] [comments]
( 43
min )
submitted by /u/HamletsLastLine
[link] [comments]
( 46
min )
In this article (https://dallasinnovates.com/exclusive-qa-john-carmacks-different-path-to-artificial-general-intelligence/) there is a quote from John Carmack that read: "I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today. "
My question is, what are these 40 papers?
submitted by /u/Gryphx
[link] [comments]
( 42
min )
Can someone please help with this question - https://ai.stackexchange.com/questions/39029/why-does-advantage-learning-help-function-approximators
submitted by /u/Academic-Rent7800
[link] [comments]
( 43
min )
This effort is focused on examining the behavior of reinforcement learning
systems in personalization environments and detailing the differences in policy
entropy associated with the type of learning algorithm utilized. We demonstrate
that Policy Optimization agents often possess low-entropy policies during
training, which in practice results in agents prioritizing certain actions and
avoiding others. Conversely, we also show that Q-Learning agents are far less
susceptible to such behavior and generally maintain high-entropy policies
throughout training, which is often preferable in real-world applications. We
provide a wide range of numerical experiments as well as theoretical
justification to show that these differences in entropy are due to the type of
learning being employed.
( 2
min )
Learning-based behavior prediction methods are increasingly being deployed in
real-world autonomous systems, e.g., in fleets of self-driving vehicles, which
are beginning to commercially operate in major cities across the world. Despite
their advancements, however, the vast majority of prediction systems are
specialized to a set of well-explored geographic regions or operational design
domains, complicating deployment to additional cities, countries, or
continents. Towards this end, we present a novel method for efficiently
adapting behavior prediction models to new environments. Our approach leverages
recent advances in meta-learning, specifically Bayesian regression, to augment
existing behavior prediction models with an adaptive layer that enables
efficient domain transfer via offline fine-tuning, online adaptation, or both.
Experiments across multiple real-world datasets demonstrate that our method can
efficiently adapt to a variety of unseen environments.
( 2
min )
The higher speed, scalability and parallelism offered by ReRAM crossbar
arrays foster development of ReRAM-based next generation AI accelerators. At
the same time, sensitivity of ReRAM to temperature variations decreases
R_on/Roff ratio and negatively affects the achieved accuracy and reliability of
the hardware. Various works on temperature-aware optimization and remapping in
ReRAM crossbar arrays reported up to 58\% improvement in accuracy and
2.39$\times$ ReRAM lifetime enhancement. This paper classifies the challenges
caused by thermal heat, starting from constraints in ReRAM cells' dimensions
and characteristics to their placement in the architecture. In addition, it
reviews available solutions designed to mitigate the impact of these
challenges, including emerging temperature-resilient DNN training methods. Our
work also provides a summary of the techniques and their advantages and
limitations.
( 2
min )
Hierarchical Clustering is a popular unsupervised machine learning method
with decades of history and numerous applications. We initiate the study of
differentially private approximation algorithms for hierarchical clustering
under the rigorous framework introduced by (Dasgupta, 2016). We show strong
lower bounds for the problem: that any $\epsilon$-DP algorithm must exhibit
$O(|V|^2/ \epsilon)$-additive error for an input dataset $V$. Then, we exhibit
a polynomial-time approximation algorithm with $O(|V|^{2.5}/
\epsilon)$-additive error, and an exponential-time algorithm that meets the
lower bound. To overcome the lower bound, we focus on the stochastic block
model, a popular model of graphs, and, with a separation assumption on the
blocks, propose a private $1+o(1)$ approximation algorithm which also recovers
the blocks exactly. Finally, we perform an empirical study of our algorithms
and validate their performance.
( 2
min )
Generative adversarial networks (GANs) have many application areas including
image editing, domain translation, missing data imputation, and support for
creative work. However, GANs are considered 'black boxes'. Specifically, the
end-users have little control over how to improve editing directions through
disentanglement. Prior work focused on new GAN architectures to disentangle
editing directions. Alternatively, we propose GANravel a user-driven direction
disentanglement tool that complements the existing GAN architectures and allows
users to improve editing directions iteratively. In two user studies with 16
participants each, GANravel users were able to disentangle directions and
outperformed the state-of-the-art direction discovery baselines in
disentanglement performance. In the second user study, GANravel was used in a
creative task of creating dog memes and was able to create high-quality edited
images and GIFs.
( 2
min )
Sparseness and robustness are two important properties for many machine
learning scenarios. In the present study, regarding the maximum correntropy
criterion (MCC) based robust regression algorithm, we investigate to integrate
the MCC method with the automatic relevance determination (ARD) technique in a
Bayesian framework, so that MCC-based robust regression could be implemented
with adaptive sparseness. To be specific, we use an inherent noise assumption
from the MCC to derive an explicit likelihood function, and realize the maximum
a posteriori (MAP) estimation with the ARD prior by variational Bayesian
inference. Compared to the existing robust and sparse L1-regularized MCC
regression, the proposed MCC-ARD regression can eradicate the troublesome
tuning for the regularization hyper-parameter which controls the regularization
strength. Further, MCC-ARD achieves superior prediction performance and feature
selection capability than L1-regularized MCC, as demonstrated by a noisy and
high-dimensional simulation study.
( 2
min )
We quantify the parameter stability of a spherical Gaussian Mixture Model
(sGMM) under small perturbations in distribution space. Namely, we derive the
first explicit bound to show that for a mixture of spherical Gaussian $P$
(sGMM) in a pre-defined model class, all other sGMM close to $P$ in this model
class in total variation distance has a small parameter distance to $P$.
Further, this upper bound only depends on $P$. The motivation for this work
lies in providing guarantees for fitting Gaussian mixtures; with this aim in
mind, all the constants involved are well defined and distribution free
conditions for fitting mixtures of spherical Gaussians. Our results tighten
considerably the existing computable bounds, and asymptotically match the known
sharp thresholds for this problem.
( 2
min )
submitted by /u/justLV
[link] [comments]
( 40
min )
submitted by /u/arnolds112
[link] [comments]
( 40
min )
submitted by /u/LordPewPew777
[link] [comments]
( 40
min )
submitted by /u/visimens-technology
[link] [comments]
( 40
min )
submitted by /u/HODLTID
[link] [comments]
( 40
min )
submitted by /u/Maleficent_Suit1591
[link] [comments]
( 41
min )
submitted by /u/DANGERD0OM
[link] [comments]
( 40
min )
submitted by /u/SonntagMorgen
[link] [comments]
( 40
min )
submitted by /u/BackgroundResult
[link] [comments]
( 40
min )
submitted by /u/Mental_Character7367
[link] [comments]
( 44
min )
submitted by /u/anekii
[link] [comments]
( 40
min )
submitted by /u/HooverHooverHoober
[link] [comments]
( 40
min )
Today, the NFL is continuing their journey to increase the number of statistics provided by the Next Gen Stats Platform to all 32 teams and fans alike. With advanced analytics derived from machine learning (ML), the NFL is creating new ways to quantify football, and to provide fans with the tools needed to increase their […]
( 10
min )
The National Football League (NFL) is one of the most popular sports leagues in the United States and is the most valuable sports league in the world. The NFL, BioCore, and AWS are committed to advancing human understanding around the diagnosis, prevention, and treatment of sports-related injuries to make the game of football safer. More […]
( 10
min )
I wanted to use the Learnable Trainangulation model in a commercial project. The source code itself is under MIT licensing. However, the dataset they have used is Human3.6M, which states that the license is "FREE OF CHARGE FOR ACADEMIC USE ONLY".
Yet, recent court rulings (in the US) state that models can use copyrighted data during training, and the results are no longer bound by that copyright (e.g. Google Books). Does the same apply here?
submitted by /u/mfarahmand98
[link] [comments]
( 42
min )
Cheers to another year of cloud gaming! GeForce NOW celebrates its third anniversary with a look at how far cloud gaming has come, a community celebration and 25 new games supported in February. Members can celebrate all month long, starting with a sweet Dying Light 2 reward and support for nine more games this week, Read article >
( 7
min )
NVIDIA A100 Tensor Core GPUs running on Supermicro servers have captured leading results for inference in the latest STAC-ML Markets benchmark, a key technology performance gauge for the financial services industry. The results show NVIDIA demonstrating unrivaled throughput — serving up thousands of inferences per second on the most demanding models — and top latency Read article >
( 6
min )
For several years, NVIDIA has been working with some of the world’s leading financial institutions to develop and execute a wide range of rapidly evolving AI strategies. For the past three years, we’ve asked them to tell us collectively what’s on the top of their minds. Sometimes the results are just what we thought they’d Read article >
( 6
min )
submitted by /u/Alyx1337
[link] [comments]
( 40
min )
https://www.axios.com/2023/02/01/chatgpt-subscriptions-chatbot-openai
Not fully paywalled, but there's a tiering system.
submitted by /u/bikeskata
[link] [comments]
( 42
min )
submitted by /u/rafs2006
[link] [comments]
( 41
min )
submitted by /u/much_successes
[link] [comments]
( 40
min )
submitted by /u/ExperienceKCC
[link] [comments]
( 40
min )
submitted by /u/Calatravo
[link] [comments]
( 40
min )
submitted by /u/GlobeOpinion
[link] [comments]
( 40
min )
In this blog post, we will take a closer look at the implications of ChatGPT’s authorship, the role of AI in scientific literature, and…
Continue reading on Becoming Human: Artificial Intelligence Magazine »
( 8
min )
Linear & Logistic: The Relationship Between Regression Models
Continue reading on Becoming Human: Artificial Intelligence Magazine »
( 11
min )
Hello and welcome to the blog! My name is ChatGPT, and I am a large language model trained by OpenAI.
P.S. This article includes a use…
( 9
min )
More than $1 million in funding available to selected Solver teams and fellows.
( 7
min )
Almost 80% of today’s web content is user-generated, creating a deluge of content that organizations struggle to analyze with human-only processes. The availability of consumer information helps them make decisions, from buying a new pair of jeans to securing home loans. In a recent survey, 79% of consumers stated they rely on user videos, comments, […]
( 10
min )
Recent developments in deep learning have led to increasingly large models such as GPT-3, BLOOM, and OPT, some of which are already in excess of 100 billion parameters. Although larger models tend to be more powerful, training such models requires significant computational resources. Even with the use of advanced distributed training libraries like FSDP and […]
( 11
min )
submitted by /u/keghn
[link] [comments]
( 40
min )
Things are a lot sunnier these days for designers looking to visualize their projects in NVIDIA Omniverse, a platform for creating and operating metaverse applications.
( 6
min )
We’re launching a classifier trained to distinguish between AI-written and human-written text.
We’ve trained a classifier to distinguish between text written by a human and text written by AIs from a variety of providers. While it is impossible to reliably detect all AI-written text, we believe
( 3
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )